August 29, 2019
Removing Duplicate Strings From a JavaScript Array
In this post, I'll discuss a real-world scenario I ran into recently and how I used some modern JavaScript features to solve the problem.
The problem
Earlier this week at work, I came across an interesting problem. We have an application that pulls some data from a database, processes the data, and then generates some new files which are then uploaded to a Google Cloud Storage bucket. To validate that our application process was working correctly, we were provided a source data file with which to compare the generated, processed files. In theory, they should match.
The problem is, they didn’t. Our QA team reported that the generated files did not contain as many records as the source files.
"Huh."
I was fairly confident that our application was working as intended, but could we have missed an edge case? Was our application dropping records unintentionally?
What if the source file contained duplicate records? To test my hypothesis, I needed a quick way to remove any duplicate records from the source data. Then, I'd compare the size of the de-duped source data with the output of our generated file.
The solution
The source data looked something like this (the actual source contained many more objects and many more properties in each object):
const source = [
{ext_id: "12345", value: "curtain"},
{ext_id: "84516", value: "movement"},
{ext_id: "71458", value: "camp"},
{ext_id: "91456", value: "folk"},
{ext_id: "75124", value: "blue"},
{ext_id: "90210", value: "human"},
{ext_id: "33355", value: "reconcile"},
{ext_id: "71458", value: "camp"},
{ext_id: "99554", value: "shy"}
];
Unfortunately, there isn’t a real quick and easy way to isolate distinct objects inside an array in JavaSript. There are libraries that can do this, but I needed a quicker solution.
First, I created a new array of each object’s values concatenated as strings. If the source file entries had contained a unique ID property, I could have mapped each id into an array, but instead I needed to create a derived unique string for each object:
const sourceStrings = source.map((o) => Object.values(o).join(""));
// [ "12345curtain", "84516movement", "71458camp", "91456folk", "75124blue", "90210human", "33355reconcile", "71458camp", "99554shy" ]
sourceStrings.length
// 9
Then, to remove any duplicate strings, I created a new Set from the sourceStrings
array:
const uniqueSet = new Set(sourceStrings);
// [ "12345curtain", "84516movement", "71458camp", "91456folk", "75124blue", "90210human", "33355reconcile", "99554shy" ]
Finally, to see how many unique values were now stored in the uniqueSet
, all I had to do now was use the size
property:
uniqueSet.size
// 8
Sure enough, the source file contained duplicate records. This was a pretty good indication that our application was working as intended. Phew!
All together now:
const uniqueSet = new Set(source.map((o) => Object.values(o).join("")));
Digging deeper
So why did that work?
Set, according to the MDN:
lets you store unique values of any type, whether primitive values or object references.
By passing an iterable to the constructor (in my case, an array of strings), Set will create a new entry for each value not already stored. If it encounters a value already stored in the Set, it will ignore that value and move to the next iteration.
For example:
const arr = ['a', 'b'];
const set = new Set(arr);
set.add('a');
// Set [ "a", "b” ]
Conclusion
I don’t use Set(s) very often, but for this particular problem, it was very useful.