Nov 12, Leetcode 295. Find Median From Data String- Tricky

https://leetcode.com/problems/find-median-from-data-stream/
Language: Python — Time: 90 minutes
Full Code on Gitfront.iohttps://github.com/cswartzell

Todays challenge was deceptively difficult, and fun. It seems simple enough, but extremely tight timing constraints required some advanced thinking to solve this one.

The goal is to setup a data structure such that the median of all as-yet stored values could be quickly retrieved. Clearly we’d need to store these in an ordered list to grab the median. Note the median of a set with an odd number of elements is the middle value, but if there are an even number of elements we need the mean of the central two.

My instinct was to use a heap as I knew maintaining the sorted order as we collect new elements needed to be quick. While it works well for this, unfortunately retrieving a list in sorted order from a heap is an additional O(nlogn). A heap does a great job of input, and retrieval of the top (minimum value), but we pay the price on the backend if we need the heap sorted in list form.

Can we cheat and just add the elements to an unsorted list, then sort and return the median only when called for? Alas, Time Limit Exceeded, as it probably should. It seems there is a Python library called “sortedList” that has optimized functions for doing this more directly, but I wasn’t aware of those while solving this.

Ok, what else? Insertion Sort is pretty fast, can we implement that? My first attempt was a flop as the selection sort algorithm I know appends to the rear of a list, then swaps it’s way forward through most other elements; almost bubble sort. I’ll need to review improved/correct versions of Insertion Sort as it’s quick and easy to code.

Still not quick enough. Well, why start by appending a value to the end an linearly scanning to find its intended position, we can do much better with a Binary Search. After a bit of fumbling with the usual uncertainties about the exact formatting of a Binary Search, I managed to write a method that quickly finds our intended position.

Another user on Leetcode has written an awesome template for quickly writing a binary search, and handling the little details so you dont end up with off by one errors. Seen Here.

Inserting our value using this index and string slicing and concatenation… is too slow. Wow, this is a strict challenge. Hard to beat Binary Search (impossible actually, right?). Well, we know concatenation is slow. Turns out list.insert(position, X) is quicker, and manages to pass!

But wait! After solving the problem I had an epiphany. I was thinking about using a Two Pointer solution, needing one pointer to keep the Maximum Value of all items to the left of the center, and another for the Minimum Value for all items the right. Hey, that’s just the tops of a minHeap and a maxHeap! We can use a heap structure to solve this after all, if we use two heaps. To deal with even/odd number of elements we can store a separate Middle variable. We need to keep the two heaps equally sized, so will be rotating values through this middle variable as we add new elements to one heap or the other.

After quite a bit of tinkering, I hit upon a solution requiring five distinct cases. I suspect with some refactoring I could clean this up a bit, but I’m satisfied with how it worked:

0: Initialize the structure (twice) with the first two elements

1: If there is no middle, should the new element be the middle?

2: If not, it should go on one heap or the other. pop an element from that heap to make room for the new element, retaining both heaps as equally sized. Popping from a heap is guaranteed to give us the element closest to the center so it can be the new middle.

3: If there is a middle, and the new element is equal to it, move middle into one heap, and the new element into the other. Doesn’t matter which, they are equal. Note we are setting Middle to “None” not 0

4: There is a middle, and the new element is larger than it. Add the new element to the “right”, which is our minHeap. Move the middle to the “left”, our max heap, and set middle to “None”.

5: Else (we have a middle value, and the new element is smaller) Simple reverse of step 4. Add to the maxHeap on the left, and moving the middle to the right heap, setting it to “None”

It turns out in Python there isn’t a built in maxHeap. Instead, this is accomplished by simply flipping the sign for all inputs and treating it as a “reverse minHeap”. With the sign flipped the “largest” input value will be at the top of the minHeap as the most negative number. We just need to remember to flip the sign again whenever we retrieve values from our improvised maxHeap. Python also suggests some combined operations that push and pop simultaneously, and apparently more efficient.

Using these two heaps, we can instantly return the median as either the Middle if not “None” or the mean of the tops of the heaps. A very satisfying answer, though it took quite a bit to get there.

Leave a comment