Sometimes we need to check Strings based on how the string sounds when it is spoken and wish to perform searching on the SOUND or pronunciation of the words. For such things, SQL Server provides us SOUNDEX and DIFFERENCE Function.
SOUNDEX Function returns a four-character code as an alphanumeric expression in VARCHAR datatype to compare the similarity between strings in terms of their sounds.
SOUNDEX ( character_expression )
The character_expression can be a constant, variable, or column and it can be nested using CONCAT(),’+’.
- The first character of the code is always the first character of character_expression, converted to UPPER case.If the first character of character_expression is anything other than 26 alphabets, It avoids other checkings and returns simply as ‘0000’.
- The second through fourth characters of the code are numbers that represent the letters in the expression based on how the string sounds when it is spoken.
- Except the first character of character_expression, remove all other occurrences of the letters A, E, I, O, U, W, H and Y. There are 8 vowels (Remember WHY- ‘W, H & Y’ are added as Phonetic Vowels based on sound) and therefore 18 consonants.
- Assign numbers to the remaining (18 consonants) letters (after the first) like below:
|b, f, p, v||1|
|c, g, j, k, q, s, x, z||2|
- If two or more letters with the same number were adjacent in the original name, or adjacent except for any intervening h and w, then omit all but the first.
- Zeroes will be added at the end of the result code if neccessary to make a four-character code.
First Example for Applying Algorithm-
SELECT SOUNDEX('SQLSERVERWHY') AS RESULT; -- S426
|1||S QLSERVER||First character taken|
|2||S QLSRVR||Phonetic vowels removed|
|3||S QLSRV||Repeated letters removed (after the first letter)|
|4||S 24261||Numbers assigned|
|5||S 4261||If two or more letters with the same number were adjacent(near to) in the original name,or adjacent except for any intervening h and w, then omit all but the first|
|6||S 426||Take 2 to 4 letters after the first.(4 zeroes padded at the end if necessary to bring 4-char code)|
Second Example for Applying Algorithm-
SELECT SOUNDEX('COfFeE') AS RESULT; -- C100
|1||C OfFeE||First character taken|
|2||C fF||Phonetic vowels removed|
|3||C F||Repeated letters removed (after the first letter)|
|4||C 1||Numbers assigned|
|5||C 100||After the first, only 1 letter.(4 zeroes padded at the end to bring 4-char code)|
DIFFERENCE Function performs a SOUNDEX Function on two strings to evaluate the similarity of two SOUNDEX codes, and returns an integer value in int datatype that indicates how similar the SOUNDEX codes are for those strings on a scale of 0 to 4.
- 0 indicates weak or no similarity between the SOUNDEX codes.
- 4 indicates strongly similar, same, or even identically matching of the SOUNDEX codes.
DIFFERENCE ( character_expression1 , character_expression2 )
SELECT DIFFERENCE('TREE','THREE') AS [Similarity_Level]; --4
NOTES while Using SOUNDEX and DIFFERENCE
- DIFFERENCE and SOUNDEX have collation sensitivity.
- The SOUNDEX and DIFFERENCE Function does not trims the leading and trailing spaces by nature. For more matching capability, USE LTRIM(RTRIM( Character_Expression )) or TRIM(Character_Expression), to remove leading and trailing spaces before doing SOUNDEX, DIFFERENCE Functions.
- If the input string is NULL or empty/blank/”, then SOUNDEX Function returns NULL or 0000 and DIFFERENCE Function returns NULL or 0 respectively.
- If the input string is two or more phrase words, SOUNDEX and DIFFERENCE works only for first phrase of the input string. (Eg.,) String 1 is ‘Dream big’ and String 2 is ‘Dream Small’. SOUNDEX and DIFFERENCE takes only first phrase as ‘Dream’ and returns both are same.
Example Code- 1. SOUNDEX, DIFFERENCE Behaviour
IF OBJECT_ID('[dbo].[Locality]') IS NOT NULL DROP TABLE [dbo].[Locality]; GO CREATE TABLE [dbo].[Locality]( [Locality_key] [int] IDENTITY(1,1) NOT NULL CONSTRAINT [PK_Locality] PRIMARY KEY CLUSTERED, [Locality] [varchar](100) NULL, [Custom_Locality] [varchar](100) NULL ); GO INSERT INTO [dbo].[Locality] ([Custom_Locality]) VALUES ('T_NAGAR'),('T-NAGAR'),('T.NAGAR'),('T NAGAR'),('TI NAGAR'),('TE NAGAR'), ('TEE NAGAR'),(' TI NAGAR'),(' T NAGAR'),(' TE NAGAR '),(' TEE NAGAR '), ('TNAGAR'),('TINAGAR'),('TENAGAR'),('TEENAGAR'),(' TENAGAR '),(' TEENAGAR '); GO UPDATE [dbo].[Locality] SET [Locality]='T. NAGAR'; GO --To check SOUNDEX,DIFFERENCE Behaviour SELECT [Locality],[Custom_Locality], SOUNDEX(RTRIM(LTRIM([Locality]))) AS SOUNDEX_Locality, SOUNDEX(RTRIM(LTRIM([Custom_Locality]))) AS SOUNDEX_Custom_Locality, DIFFERENCE([Locality],RTRIM(LTRIM([Custom_Locality]))) AS DIFFERENCE_Locality_Custom_Locality FROM [dbo].[Locality]; GO
Example Code- 2. Comparing of SOUNDEX Sound-Data Like Search Results in SQL Server
DECLARE @locality VARCHAR(MAX)='ti nagar'; SELECT DISTINCT locality FROM dbo.[Locality] WHERE SOUNDEX(custom_locality) = SOUNDEX(@locality);
Example Code- 3. Sorting of SOUNDEX Sound-Data Like Search Results in SQL Server
DECLARE @locality VARCHAR(MAX)='ti nagar'; SELECT locality,custom_locality FROM dbo.[Locality] WHERE SOUNDEX(custom_locality) = SOUNDEX(@locality) ORDER BY (CASE WHEN (CHARINDEX(@locality ,custom_locality)) = 1 AND LEN(custom_locality) = LEN(@locality ) THEN -1 ELSE CHARINDEX(@locality ,custom_locality) END),custom_locality; GO
Use SOUNDEX and DIFFERENCE functions when you want to Compare Sound-Like Data in SQL Server. Thanks for reading!