How to Group Anagrams Together in Separate Arrays

Photo by Glen Carrie on Unsplash

How to Group Anagrams Together in Separate Arrays

Play this article

I'm subscribed to cassidoo's newsletter and one of the interesting things she always has in there is an Interview question of the week. Usually I don't spend too much time actually solving these but this one looked fun and I thought I had a pretty good idea of how to do it so here it is and my solution.

Interview Question of the Week

buttondown.email/cassidoo/archive/almost-ev..

This week’s question: Given an array of strings, group the anagrams together in separate arrays. An anagram is a word or phrase formed by rearranging the letters of another word or phrase, using all the original letters exactly once.

Example:

$ groupAnagrams(["eat","tea","ten","poop","net","ate"])
$ [["poop"],["net","ten"],["eat","tea","ate"]]

Solution

Here is my ruby implementation of the solution. It works by summing the characters in each word to group them. Strings are simply arrays of characters and all characters have a value. By summing these characters we are left with a value for each word that can group words with the same value together.

words = ["eat","tea","ten","poop","net","ate"]
words.group_by { |word| word.sum }.values
[["eat", "tea", "ate"], ["ten", "net"], ["poop"]]

# Here is how it groups them by their value
words.group_by { |word| word.sum }
{314=>["eat", "tea", "ate"], 327=>["ten", "net"], 446=>["poop"]}

This works pretty well, unfortunately it has some edge cases that could ruin it. For example, since we are summing the value of the characters we may group words that are wildly different but still have the same value. For example, the word "a" * 100 (which is just a word with 100 a's) would be equal to "d" * 97 (which is just a word with 97 d's) because "a" = 97 and "d" = 100 so the two words would both equal 9700. We can fix this by sorting the characters in the word and use those as the grouping key.

words = ["eat","tea","ten","poop","net","ate"]
words.group_by { |word| word.chars.sort.join.to_sym }.values
[["eat", "tea", "ate"], ["ten", "net"], ["poop"]]

# Here is how it groups them by their sorted characters
words.group_by { |word| word.chars.sort.join.to_sym }
{:aet=>["eat", "tea", "ate"], :ent=>["ten", "net"], :oopp=>["poop"]}

This should handle all the edge cases that might appear. The only downside with this is sorting each word can get costly if the words are really big. Since this is just an exercise and I didn't want to spend too long on it I think this is good for now.

Did I miss anything? Is there a better solution? Probably. I'd love to hear about it!