Java Tutorial » Chapter 18 — Regular Expressions

Chapter 18 — Java Regular Expressions

Mastering pattern matching for powerful string searching, validation, and manipulation.

1. Introduction to Regex

What are Regular Expressions?

A Regular Expression (regex or regexp) is a sequence of characters that specifies a search pattern. It's an extremely powerful tool for finding, validating, and manipulating text based on predefined patterns.

Instead of writing complex loops and conditional logic to check if a string "looks like" an email address, you can simply define a regex pattern for an email and test the string against it. Java's regex capabilities are primarily provided by the `java.util.regex` package, which involves two main classes:

  • Pattern: A compiled representation of a regular expression. You create a `Pattern` object from a regex string.
  • Matcher: An engine that performs match operations on an input string by interpreting a `Pattern`.
2. Regular Expression Syntax

Building Your Patterns

The power of regex comes from its special syntax. Here are the most common building blocks:

Subexpression Matches Example
[abc] Any one of the characters a, b, or c "[t]est" matches "test"
[^abc] Any character except a, b, or c "[^t]est" matches "rest" but not "test"
[a-zA-Z] a through z, or A through Z, inclusive "[a-z]+" matches "java"
\d Any digit (equivalent to [0-9]) "\d{3}" matches "123" in "abc123"
\D Any non-digit "\D+" matches "abc" in "abc123"
\s Any whitespace character (space, tab, newline) "\s+" matches " " in "Hello World"
\S Any non-whitespace character "\S+" matches "Hello" in "Hello World"
\w Any word character (a-z, A-Z, 0-9, _) "\w+" matches "var1" in "var1 = 5;"
\W Any non-word character "\W+" matches " = " in "var1 = 5;"
X? X, once or not at all "colou?r" matches "color" and "colour"
X* X, zero or more times "ab*" matches "a", "ab", "abb", etc.
X+ X, one or more times "ab+" matches "ab", "abb", etc.
X{n} X, exactly n times "\d{5}" matches a 5-digit number
X{n,} X, at least n times "\d{2,}" matches any number with 2+ digits
X{n,m} X, at least n but not more than m times "\d{2,4}" matches 2, 3, or 4 digits
^ The beginning of a line "^Hello" matches "Hello" at the start of a string
$ The end of a line "World$" matches "World" at the end of a string
3. Pattern & Matcher Classes

The Core of Java Regex

To use a regex in Java, you follow a two-step process:

  1. Compile the regex string into a Pattern object. This is done using the static `Pattern.compile()` method. Compiling the pattern once is more efficient if you plan to use it multiple times.
  2. Create a Matcher object. You get a `Matcher` from the `Pattern` object by calling the `pattern.matcher(inputString)` method. The `Matcher` is what you use to perform the actual match operations on your specific input string.
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexSetup {
    public static void main(String[] args) {
        // 1. Define the regex string
        String regex = "\\d+"; // Matches one or more digits

        // 2. Compile the regex into a Pattern object
        Pattern pattern = Pattern.compile(regex);

        // 3. Create a Matcher object for a specific input string
        String input = "The year is 2023.";
        Matcher matcher = pattern.matcher(input);

        // Now, we can use the matcher to find and work with matches
        // (see next section for methods)
    }
}
4. Capturing Groups

Extracting Parts of a Match

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing parts of the regex inside parentheses (...). They are incredibly useful for extracting specific pieces of information from a matched string.

For example, to extract the area code from a phone number, you can group the area code part of the pattern.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CapturingGroups {
    public static void main(String[] args) {
        String text = "My number is (123) 456-7890.";
        // Group 1: (\d{3}) captures the area code
        String regex = "\\((\\d{3})\\) (\\d{3})-(\\d{4})";

        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);

        if (matcher.find()) {
            // group(0) or group() returns the entire matched string
            System.out.println("Full match: " + matcher.group(0)); // (123) 456-7890

            // group(1) returns the first captured group
            System.out.println("Area Code: " + matcher.group(1)); // 123

            // group(2) returns the second captured group
            System.out.println("Exchange: " + matcher.group(2)); // 456

            // group(3) returns the third captured group
            System.out.println("Line Number: " + matcher.group(3)); // 7890
        }
    }
}
5. Methods of the Matcher Class

Performing the Match

Once you have a `Matcher` object, you can use its methods to find and work with matches. Here are the most important ones.

boolean matches()

Attempts to match the entire input sequence against the pattern. It returns `true` only if the whole string matches the regex.

public class MatchesMethod {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("Java");
        Matcher matcher1 = pattern.matcher("Java");
        Matcher matcher2 = pattern.matcher("I love Java programming");

        System.out.println(matcher1.matches()); // true, the whole string is "Java"
        System.out.println(matcher2.matches()); // false, the string contains more than just "Java"
    }
}

boolean find()

Scans the input sequence to find the next subsequence that matches the pattern. This is the most common method for finding one or more occurrences within a larger string. It's typically used in a `while` loop.

public class FindMethod {
    public static void main(String[] args) {
        String text = "a1b2c3d4";
        Pattern pattern = Pattern.compile("\\d"); // Find any single digit
        Matcher matcher = pattern.matcher(text);

        System.out.println("Finding all digits in '" + text + "':");
        while (matcher.find()) {
            System.out.println("Found digit: " + matcher.group() + " at index " + matcher.start());
        }
    }
}

String group()

Returns the input subsequence matched by the previous match operation (`find()` or `matches()`). As seen in the capturing groups section, you can pass an integer to `group(int)` to retrieve a specific capturing group.

public class GroupMethod {
    public static void main(String[] args) {
        String text = "user_id: 12345";
        Pattern pattern = Pattern.compile("user_id: (\\d+)");
        Matcher matcher = pattern.matcher(text);

        if (matcher.find()) {
            // group() is the same as group(0)
            String fullMatch = matcher.group(); 
            String capturedId = matcher.group(1);

            System.out.println("Full match: " + fullMatch); // "user_id: 12345"
            System.out.println("Captured ID: " + capturedId); // "12345"
        }
    }
}

int start() / int end()

After a successful match operation, `start()` returns the start index of the match, and `end()` returns the index of the first character *after* the match.

public class StartEndMethod {
    public static void main(String[] args) {
        String text = "Error 404: Not Found";
        Pattern pattern = Pattern.compile("\\d{3}");
        Matcher matcher = pattern.matcher(text);

        if (matcher.find()) {
            int startIndex = matcher.start();
            int endIndex = matcher.end();
            System.out.println("Found match at index " + startIndex);
            System.out.println("Match ends at index " + endIndex);
            System.out.println("Matched substring: " + text.substring(startIndex, endIndex));
        }
    }
}

String replaceAll(String replacement)

Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.

public class ReplaceAllMethod {
    public static void main(String[] args) {
        String text = "My SSN is 123-45-6789, keep it secret.";
        // Replace all sequences of 3 digits with XXX
        Pattern pattern = Pattern.compile("\\d{3}");
        Matcher matcher = pattern.matcher(text);
        String redacted = matcher.replaceAll("XXX");

        System.out.println("Original: " + text);
        System.out.println("Redacted: " + redacted);
    }
}
6. Practice & Challenge

Test Your Skills

  1. Write a program to check if a string contains only letters (a-z, A-Z).
  2. Write a program to extract all numbers from a given text.
  3. Write a program to validate a simple URL format (e.g., `http://www.example.com`).
  4. Write a program to find all words in a string that start with the letter 'a'.
  5. Write a program to replace all multiple spaces in a string with a single space.

🏆 Challenge: Advanced Password Validator

Create a program that validates a password based on the following complex rules:

  • At least 8 characters long.
  • Contains at least one uppercase letter (A-Z).
  • Contains at least one lowercase letter (a-z).
  • Contains at least one digit (0-9).
  • Contains at least one special character (e.g., `!@#$%^&*`).
Use lookaheads (`(?=...)`) to combine these rules into a single, powerful regex pattern.

import java.util.Scanner;
import java.util.regex.Pattern;

public class PasswordValidator {
    public static void main(String[] args) {
        // Regex with lookaheads to check for all conditions
        // (?=.*[A-Z])  -> at least one uppercase letter
        // (?=.*[a-z])  -> at least one lowercase letter
        // (?=.*\\d)    -> at least one digit
        // (?=.*[!@#$%^&*]) -> at least one special character
        // .{8,}         -> at least 8 characters long
        String passwordRegex = "^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d)(?=.*[!@#$%^&*]).{8,}$";
        Pattern pattern = Pattern.compile(passwordRegex);

        Scanner scanner = new Scanner(System.in);
        System.out.println("--- Password Validator ---");
        System.out.print("Enter a password to validate: ");
        String password = scanner.nextLine();

        if (pattern.matcher(password).matches()) {
            System.out.println("Password is strong and valid!");
        } else {
            System.out.println("Password is invalid. It must meet all criteria:");
            System.out.println("- At least 8 characters long");
            System.out.println("- At least one uppercase letter");
            System.out.println("- At least one lowercase letter");
            System.out.println("- At least one digit");
            System.out.println("- At least one special character (!@#$%^&*)");
        }

        scanner.close();
    }
}