Regular Expression, or regex or regexp in short, is extremely and amazingly powerful in searching and manipulating text strings, particularly in processing text files. One line of regex can easily replace several dozen lines of programming codes. Show Regex is supported in all the scripting languages (such as Perl, Python, PHP, and JavaScript); as well as general purpose programming languages such as Java; and even word processors such as Word for searching texts. Getting started with regex may not be easy due to its geeky syntax, but it is certainly worth the investment of your time. Regex By ExamplesThis section is meant for those who need to refresh their memory. For novices, go to the next section to learn the syntax, before looking at these examples. Regex Syntax Summary
Example: Numbers [0-9]+ or \d+
Code Examples (Python, Java, JavaScript, Perl, PHP)Code Example in PythonSee "" for full coverage. Python supports Regex via module find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++22. Python also uses backslash ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) for escape sequences (i.e., you need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 466 for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++27 for find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17), but it supports raw string in the form of find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++29, which ignore the interpretation of escape sequences - great for writing regex. Code Example in JavaSee "Regular Expressions (Regex) in Java" for full coverage. Java supports Regex in package find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++30. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 The output is: find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++ Code Example in PerlSee "" for full coverage. Perl makes extensive use of regular expressions with many built-in syntaxes and operators. In Perl (and JavaScript), a regex is delimited by a pair of forward slashes (default), in the form of find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++31. You can use built-in operators:
In Perl, you can use single-quoted non-interpolating string find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++34 to write regex to disable interpretation of backslash ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) by Perl. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Code Example in JavaScriptSee "" for full coverage. In JavaScript (and Perl), a regex is delimited by a pair of forward slashes, in the form of find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++36. There are two sets of methods, issue via a find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++37 object or a find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++38 object. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Code Example in PHP[TODO] Example: Full Numeric Strings ^[0-9]+$ or ^\d+$
Example: Positive Integer Literals [1-9][0-9]*|0 or [1-9]\d*|0
Example: Full Integer Literals ^[+-]?[1-9][0-9]*|0$ or ^[+-]?[1-9]\d*|0$
Example: Identifiers (or Names) [a-zA-Z_][0-9a-zA-Z_]* or [a-zA-Z_]\w*
Example: Image Filenames ^\w+\.(gif|png|jpg|jpeg)$
Example: Email Addresses ^\w+([.-]?\w+)*@\w+([.-]?\w+)*(\.\w{2,3})+$
Exercise: Interpret this regex, which provide another representation of email address: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2359. Example: Swapping Words using Parenthesized Back-References ^(\S+)\s+(\S+)$ and $2 $1
Code Example in PythonPython keeps the parenthesized back references in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2387 keeps the entire match. Code Example in JavaJava keeps the parenthesized back references in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Example: HTTP Addresses ^http:\/\/\S+(\/\S+)*(\/)?$
Example: Regex Patterns in AngularJSThe following rather complex regex patterns are used by AngularJS in JavaScript syntax: Example: Sample Regex in PerlRegular Expression (Regex) SyntaxA Regular Expression (or Regex) is a pattern (or filter) that describes a set of strings that matches the pattern. In other words, a regex accepts a certain set of strings and rejects the rest. A regex consists of a sequence of characters, metacharacters (such as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2373, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238s, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4601, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2371) and operators (such as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 237, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9). They are constructed by combining many smaller sub-expressions. Matching a Single CharacterThe fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4609 and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4610) and digits ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4611), match itself. For example, the regex x matches substring "x" ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4614 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4615; and 9 matches find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++0. Non-alphanumeric characters without special meaning in regex also matches itself. For example, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++1 matches find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++2; find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++3 matches find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++4. Regex Special Characters and Escape SequencesRegex's Special CharactersThese characters have special meaning in regex (I will discuss in detail in the later sections):
Escape SequencesThe characters listed above have special meanings in regex. To match these characters, we need to prepend it with a backslash ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238), known as escape sequence. For examples, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 462 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 463; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4636 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4637; and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 460 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 461. Regex also recognizes common escape sequences such as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 468 for newline, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 469 for tab, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200 for carriage-return, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 for a up to 3-digit octal number, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202 for a two-digit hex code, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 203 for a 4-digit Unicode, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 204 for a 8-digit Unicode. Code Example in PythonCode Example in JavaScript[TODO] Code Example in Java[TODO] Matching a Sequence of Characters (String or Text)Sub-ExpressionsA regex is constructed by combining many smaller sub-expressions or atoms. For example, the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4647 matches the string " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4647". The matching, by default, is case-sensitive, but can be set to case-insensitive via modifier. OR (|) OperatorYou can provide alternatives using the "OR" operator, denoted by a vertical bar 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4649. For example, the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4650 accepts strings " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4651", " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4652", " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4653" or " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4654". Bracket List (Character Class) [...], [^...], [.-.]A bracket expression is a list of characters enclosed by 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4623, also called character class. It matches ANY ONE character in the list. However, if the first character of the list is the caret ( find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9), then it matches ANY ONE character NOT in the list. For example, the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4657 matches a single digit find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++41, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4659, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4654, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4661, or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4662; the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4663 matches any single character other than find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++41, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4659, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4654, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4661, or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4662. Instead of listing all characters, you could use a range expression inside the bracket. A range expression consists of two characters separated by a hyphen ( ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0). It matches any single character that sorts between the two characters, inclusive. For example, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4670 is the same as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4671. You could include a caret ( find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9) in front of the range to invert the matching. For example, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4673 is equivalent to 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4674. Most of the special regex characters lose their meaning inside bracket list, and can be used as they are; except find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9, ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 234 or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238.
Name Character Classes in Bracket List (For Perl Only?)Named (POSIX) classes of characters are pre-defined within bracket expressions. They are:
For example, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2012 means 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2013. (Note that the square brackets in these class names are part of the symbolic names, and must be included in addition to the square brackets delimiting the bracket list.) Metacharacters ., \w, \W, \d, \D, \s, \SA metacharacter is a symbol with a special meaning inside a regex.
Examples: Backslash (\) and Regex Escape SequencesRegex uses backslash ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) for two purposes:
Take note that in many programming languages (C, Java, Python), backslash ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) is also used for escape sequences in string, e.g., 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2056 for newline, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2057 for tab, and you also need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2058 for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238. Consequently, to write regex pattern 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 466 (which matches one 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) in these languages, you need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2062 (two levels of escape!!!). Similarly, you need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2063 for regex metacharacter find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17. This is cumbersome and error-prone!!! Occurrence Indicators (Repetition Operators): +, *, ?, {m}, {m,n}, {m,}A regex sub-expression may be followed by an occurrence indicator (aka repetition operator):
For example: The regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2074 accepts " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2075", " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2076" and " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2077". ModifiersYou can apply modifiers to a regex to tailor its behavior, such as global, case-insensitive, multiline, etc. The ways to apply modifiers differ among languages. In Perl, you can attach modifiers after a regex, in the form of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2078. For examples: In Java, you apply modifiers when compiling the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2079. For example, The commonly-used modifer modes are:
Greediness, Laziness and Backtracking for Repetition OperatorsGreediness of Repetition Operators *, +, ?, {m,n}: The repetition operators are greedy operators, and by default grasp as many characters as possible for a match. For example, the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2074 try to match for " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2077", then " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2076", and then " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2075". Lazy Quantifiers *?, +?, ??, {m,n}?, {m,}?, : You can put an extra find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8 after the repetition operators to curb its greediness (i.e., stop at the shortest match). For example, input = "The Backtracking: If a regex reaches a state where a match cannot be completed, it backtracks by unwinding one character from the greedy match. For example, if the regex input = "The00 is matched against the string " input = "The01", the input = "The02 first matches " input = "The01"; unwinds to match " input = "The04"; unwinds to match " input = "The05"; and finally unwinds to match " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4614", such that the rest of the patterns can find a match. Possessive Quantifiers *+, ++, ?+, {m,n}+, {m,}+: You can put an extra find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6 to the repetition operators to disable backtracking, even it may result in match failure. e.g, input = "The08 will not match input = "The09. This feature might not be supported in some languages. Position Anchors ^, $, \b, \B, \<, \>, \A, \ZPositional anchors DO NOT match actual character, but matches position in a string, such as start-of-line, end-of-line, start-of-word, and end-of-word.
Capturing Matches via Parenthesized Back-References & Matched Variables $1, $2, ...Parentheses
These back-references (or capturing groups) are stored in special variables input = "The40 creates two back-references which matched with the first two words. The matched words are stored in "x" 9 and 9 0 (or 9 1 and 9 2), respectively.Back-references are important to manipulate the string. Back-references can be used in the substitution string as well as the pattern. For examples, (Advanced) Lookahead/Lookbehind, Groupings and ConditionalThese feature might not be supported in some languages. Positive Lookahead (?=pattern)The input = "The54 is known as positive lookahead. It performs the match, but does not capture the match, returning only the result: match or no match. It is also called assertion as it does not consume any characters in matching. For example, the following complex regex is used to match email addresses by AngularJS: ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$ The first positive lookahead patterns input = "The55 sets the maximum length to 254 characters. The second positive lookahead input = "The56 sets maximum of 64 characters before the input = "The57 sign for the username. Negative Lookahead (?!pattern)Inverse of input = "The54. Match if input = "The59 is missing. For example, input = "The60 matches input = "The61 in input = "The62 (not consuming input = "The63); but not input = "The64. Whereas input = "The65 matches input = "The61 in input = "The64, but not find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++43. Positive Lookbehind (?<=pattern)[TODO] Negative Lookbehind (?pattern)[TODO] Non-Capturing Group (?:pattern)Recall that you can use Parenthesized Back-References to capture the matches. To disable capturing, use input = "The69 inside the parentheses in the form of input = "The70. In other words, input = "The69 disables the creation of a capturing group, so as not to create an unnecessary capturing group. Example: [TODO] Named Capturing Group (?<name>pattern)The capture group can be referenced later by input = "The72. Atomic Grouping (>pattern)Disable backtracking, even if this may lead to match failure. Conditional (?(Cond)then|else)[TODO] UnicodeThe metacharacters find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2371, (word and non-word character), "x" 5, input = "The13 (word and non-word boundary) recongize Unicode characters. Apa itu regex php?Regex merupakan singkatan dari Regular Expression, yaitu sebuah metode untuk mencari suatu pola dalam sebuah string. Dalam PHP, yang sering digunakan adalah PCRE atau “Perl Compatible Regular Expression†.
Apa itu Regex Pattern?Regex itu adalah sebuah teks dalam bentuk pola untuk pencarian dan banyak dipakai untuk pencocokan, pencarian, dan manipulasi teks.
Apa itu regex javascript?Regular Expression(Regex) adalah pola yang digunakan sebagai kriteria untuk mendapatkan kombinasi karakter pada suatu string.
Apa kepanjangan dari regexp?Singkatan regex dan regexp (regular expression) menunjukkan Regular Expression yang digunakan dalam ilmu komputer teoritis, pemrograman, pengembangan perangkat lunak, pengolah kata dan optimisasi mesin pencari.
|