Press "Enter" to skip to content

How and when to use Binary Search in a non-duplicate Array

Ajk 3

This is a simple guide on knowing how to write a binary search in a non-duplicate array. It could improve your code performance by a small to big amount depending on the size of the list you’re keeping track of and on whether you are adding and removing to the list often. It could also be very handy in interview questions.

I find myself often looking at code where the usual technique to add items to an array is just appending at the end. Whenever you need to remove items from the list you do a quick

or even worse

(and do not get me started about why I do not like jQuery)

Now let us examine the time complexity here. Insertion takes O(1) while removing an item takes O(n) (In case you did not know, indexOf() takes O(n)!!). Can we do better? Of course we can. Let’s try to keep a sorted array. The thing that I like best with using Binary Search, is that you can use the same piece of code for both insertion and deletion.

Code is pretty straight forward. The only thing you might find confusing is the | 0. As explained in the comment that does the same operation as Math.floor would. How is that you ask? Well | is the simple bitwise or operation. Bitwise operators convert numbers to a 32-bit sequence. So for any positive signed numbers (which is what we would have for indexes of an array), the conversion will drop the decimal part.

Now let’s do a little analysis, which would be very positive in an interview. When do we want to use our binarySearchAddRemove? The time complexity for adding and/or removing is O(log(n)). Hence in any case when we expect adding and removing items from a list we will be better off using binarySearchAddRemove. On the other hand if you were only going to add items to the list (and the possibly cleared it), you will be better off using the simple appending at the end of the list discussed at the beginning.

Edit: Since I was asked about it, below you’ll find a quick Java adaptation of the code. Enjoy 🙂

Edit2: New version of the code ;D with less checks (as per prof. Gries comment)

Final thoughts and things to note: Our code now runs a check at the end of the loop. This will decrease our best case performance (since now we have to go through all the loop all the time / cannot exit when we find the element), but will increase our performance for the worst case scenario, since we are removing 2 checks from the loop>
Other than that we have to be mindful about the 2 edge cases:
1. List is empty
2. Item we’re looking for is greater than everything in the list so it need to be appended at the end.

Finally with this final rewriting, we support a duplicate array too, where we will find the leftmost occurrence of our item :).

The following two tabs change content below.
If you like one of my posts the best way to support is give it a thumbs up, comment, or share it on social media 🙂
  1. Paul Gries Paul Gries

    Nice tip!

    Here’s a lovely interview-style followup: in your while loop, you test && !itemFound on every iteration. It of course evaluates to true at most once. The same thing goes for your if/else statement: the second clause, currElement > item, will only be false when/if you find the item.

    One consequence of this style of binary search is that, if there are duplicate items, you don’t know which one you found. It’s fairly straightforward to rewrite the loop so that it finds the index of the rightmost item <= the item you're looking for. Your code gets shorter as a result!

    That's the first challenge: simplify the boolean expression in your while condition, and also simplify your if/else/else so that it doesn't need the extra expressions. You'll have less code in your loop, although you'll need to add a statement after the loop to set itemFound.

    The second challenge is a profiling one: is this new approach faster? Always? What if the list is enormous so that swapping occurs — do you get a benefit if you exit early?

    • Ajk_P Ajk_P

      Professor Gries! It’s an honor to have you comment on my writing :).

      Very nice follow up question. I wrote the code for an out of the loop condition check which will increase our performance on the worst case (by removing the extra check(s) in every iteration), but will decrease it on the best case (where we will have to do the full log(n) runs of the loop and not immediately exit).

      I would assume that with gigantic lists we would like to exit the loop as early as possible, since the paging costs of running the loop until the end on best and average cases, will outweigh “a few extra checks per iteration” on the worst case scenario, therefore giving us a better average case complexity and a tiny bit worse worst-case. ;D

Leave a Reply

Your email address will not be published. Required fields are marked *