TroubleShoot: impala \d in REGEXP not working

Ebates Coupons and Cash Back


Using REGEXP_EXTRACT in impala, to extract a number after a specific text pattern, you can use the below SQL.

Example: Say, The pattern you are looking for is the ‘numeric digits’ that comes after the TEXT/String ‘FBDV’

I’m using a hard-coded text value (‘Sample Text FBBV/3809028;FBDV/436876;FBV/iPhone7,2;FBMD/iPhone]’) for demo the result.

SELECT REGEXP_EXTRACT(c, 'FBDV\/(\\d*?)',1) from (
select 'Sample Text FBBV/3809028;FBDV/436876;FBV/iPhone7,2;FBMD/iPhone]' c) t

Output: 436876

If the value comes from a column in table, you can use:

SELECT REGEXP_EXTRACT({column_name}, 'FBDV\/(\\d*?)',1) from {table_name}

The trick here is: usually \d in REGEX denotes any numeric, but in Impala SQL inside capturing open/close parenthesis, you will have to escape the ‘blakslash’ with a backslash, so it requires \\d instead.


Ebates Coupons and Cash Back