TroubleShoot: impala \d in REGEXP not working

Ebates Coupons and Cash Back


Using REGEXP_EXTRACT in impala, to extract a number after a specific text pattern, you can use the below SQL.

Example: Say, The pattern you are looking for is the ‘numeric digits’ that comes after the TEXT/String ‘FBDV’

I’m using a hard-coded text value (‘Sample Text FBBV/3809028;FBDV/436876;FBV/iPhone7,2;FBMD/iPhone]’) for demo the result.

SELECT REGEXP_EXTRACT(c, 'FBDV\/(\\d*?)',1) from (
select 'Sample Text FBBV/3809028;FBDV/436876;FBV/iPhone7,2;FBMD/iPhone]' c) t

Output: 436876

If the value comes from a column in table, you can use:

SELECT REGEXP_EXTRACT({column_name}, 'FBDV\/(\\d*?)',1) from {table_name}

The trick here is: usually \d in REGEX denotes any numeric, but in Impala SQL inside capturing open/close parenthesis, you will have to escape the ‘blakslash’ with a backslash, so it requires \\d instead.


Ebates Coupons and Cash Back

TroubleShoot: Table Not Found Spark-Submit

Ebates Coupons and Cash Back


When you know that a table exists in Hive / Impala, either because you created it, or you are able to query the table and view it’s data through UI or CLI, and also, you are able to query the table through pyspark or spark-shell, but then when you access the same table in spark-submit command, the application throws error ‘Table not found’, then the most likely cause is that

You are using sqlContext in your SparkContext [ SqlContext(sc)] rather than using HiveContext [ HiveContext(sc) ].

Change your code from SqlContext to HiveContext, and your code would be able to find the table again !!


Ebates Coupons and Cash Back

Extract Folder names from hadoop

Ebates Coupons and Cash Back


If you are looking to extract all the folder names under a certain directory in hadoop, here is the command:

Say you are looking under Directory /user/{username}, the ‘hadoop fs -ls /user/{username} gives the below output’ .

drwxr-xr-x   - user  user  0 2017-08-30 13:44 /user/{username}/product
drwxr-xr-x   - user  user  0 2017-09-01 13:39 /user/{username}/quote_lines

And the below command:

hadoop fs -ls /user/{username} | grep "^d" | awk -F':[0-9]* ' '/:/{print $2}' | cut -d'/' -f5

gives the directories:

product
quote_lines

Ebates Coupons and Cash Back