Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1.2 String Manipulation

Last updated: September 5, 2024

See the Postgres docs Section 9.7 for a more in-depth description of the below.

LIKE, NOT LIKE

The LIKE/NOT LIKE keywords, which are often used in WHERE clauses to find matches between a string and a given pattern. Using this keyword, two specific characters are used to substitute for one or more characters in a pattern string:

If the pattern does not contain any percent sign (%) or underscore (_), then LIKE/NOT LIKE function as exact match.

Advanced string functionality

There are many string manipulation functions, such as substring, STRPOS (starting position of a substring), concatenation (||), and more. For example, the SUBSTRING function extracts the substring from a string with start and count parameters.

SELECT 'Hello' || 'World',
       STRPOS('Hello', 'el'),
       SUBSTRING('Hello', 2, 3);

In SQL the || operator means concatenation.

  ?column?  | strpos | substring 
------------+--------+-----------
 HelloWorld |      2 | ell

REGEXP_REPLACE

PostgreSQL can also handle (POSIX) regular expressions, e.g., with REGEXP_REPLACE.

General syntax:

REGEXP_REPLACE(source, pattern, replacement[, flags])

This follows standard POSIX regular expressions, and therefore you can use capture groups in both pattern (with parentheses to denote the subpattern to capture) and replacement (with \<number>, where number is the one-indexed index of the capture group.

To refresh on regular expressions, here are some examples:

sourcepatternreplacementflagsReturn valueMeaning
'Hannah Montana''(.*) (.*)''\1''Hannah'Extract the first name.
'Hannah Montana' '(.*) (.*)''\2, \1''Montana, Hannah'Extract last name, first name.
'Phone Number 510 642 3214''[a-zA-Z ]'''hone Number 510 642 3214Remove first alphanumeric character or space.
'Phone Number 510 642 3214''[a-zA-Z ]''''g'5106423214Remove all alphanumeric characters or spaces.

Here are some example queries:

  1. Extract first and last name of all people:

SELECT REGEXP_REPLACE(name, '(.*) (.*)', '\2')
         as lastname,
       REGEXP_REPLACE(name, '(.*) (.*)', '\1')
         as firstname
FROM people;
  1. Count number of first names

SELECT REGEXP_REPLACE(name, '(.*) (.*)', '\1')
         as firstname,
       COUNT(*) as countname
FROM people
GROUP BY firstname
ORDER BY countname desc;
  1. Compute length of longest first name

WITH firstnames AS (
    SELECT REGEXP_REPLACE(name, '(.*) (.*)', '\1')
             as firstname
    FROM people
)
SELECT MAX(LENGTH(firstname)), firstname
FROM firstnames;