1.2. String Manipulation#
Last updated: September 5, 2024
See the Postgres docs Section 9.7 for a more in-depth description of the below.
1.2.1. LIKE, NOT LIKE#
The LIKE/NOT LIKE
keywords, which are often
used in WHERE
clauses to find matches between a string and a given
pattern. Using this keyword, two specific characters are used to
substitute for one or more characters in a pattern string:
%
matches zero or more characters_
matches a single character
If the pattern does not contain any percent sign (%
) or underscore (_
), then LIKE/NOT LIKE
function as exact match.
1.2.2. Advanced string functionality#
There are many string manipulation functions, such as substring
,
STRPOS
(starting position of a substring), concatenation (||
), and more. For
example, the SUBSTRING
function extracts the substring from a string
with start and count parameters.
Note
Unlike in Python, SQL strings are indexed from 1.
SELECT 'Hello' || 'World',
STRPOS('Hello', 'el'),
SUBSTRING('Hello', 2, 3);
In SQL the ||
operator means concatenation.
?column? | strpos | substring
------------+--------+-----------
HelloWorld | 2 | ell
1.2.3. REGEXP_REPLACE#
PostgreSQL can also handle (POSIX) regular expressions, e.g., with REGEXP_REPLACE
.
General syntax:
REGEXP_REPLACE(source, pattern, replacement[, flags])
source
: source string, or column name, etc.pattern
: POSIX regular expressionreplacement
: replacement string when pattern matchesflags
: Optional parameter. Pass ing
for to process all matches. See PostgresSQL docs Table 9.24 for the full list of options.
This follows standard POSIX regular expressions, and therefore you can use capture groups in both pattern
(with parentheses to denote the subpattern to capture) and replacement
(with \<number>
, where number is the one-indexed index of the capture group.
To refresh on regular expressions, here are some examples:
|
|
|
|
Return value |
Meaning |
---|---|---|---|---|---|
|
|
|
|
Extract the first name. |
|
|
|
|
|
Extract last name, first name. |
|
|
|
|
|
Remove first alphanumeric character or space. |
|
|
|
|
|
|
Remove all alphanumeric characters or spaces. |
Here are some example queries:
Extract first and last name of all people:
SELECT REGEXP_REPLACE(name, '(.*) (.*)', '\2')
as lastname,
REGEXP_REPLACE(name, '(.*) (.*)', '\1')
as firstname
FROM people;
Count number of first names
SELECT REGEXP_REPLACE(name, '(.*) (.*)', '\1')
as firstname,
COUNT(*) as countname
FROM people
GROUP BY firstname
ORDER BY countname desc;
Compute length of longest first name
WITH firstnames AS (
SELECT REGEXP_REPLACE(name, '(.*) (.*)', '\1')
as firstname
FROM people
)
SELECT MAX(LENGTH(firstname)), firstname
FROM firstnames;