YARB (Yet Another Rails Blog): 2012

Tuesday, December 18, 2012

Merge Sort

Merge Sort Points:

Divide and Conquer (and combine):

Divide unsorted list into 2 sub lists. Sort sub lists recursively until the list size is of 1.
Then it returns itself.

Combine the sub-lists back into form the main list (with sorted items.)

Analysis: O(n log n) Average and Worst.

Although it is O(n log n) it requires a lot of space for merges.

Best done as external sort. From wikipedia

An external merge sort is practical to run using disk or tape drives when the data to be sorted is too large to fit into memory. External sorting explains how merge sort is implemented with disk drives. A typical tape drive sort uses four tape drives. All I/O is sequential (except for rewinds at the end of each pass). A minimal implementation can get by with just 2 record buffers and a few program variables.

Tuesday, November 20, 2012

Graphs Adjacency Lists

http://sourcecodemania.com/graph-implementation-in-cpp/ for details on implementation in C++.

An issue that you man encounter when dealing with graphs sparsity. There are a lot of case where you create an adjacency matrix to hold your data and realize it is full of 0s. That can or cannot be useful depending on your application. Be for a more space efficient data structure use the Adjacency list (diagram above). The basic idea here is to create N x 1 array of linked lists. The elements in the linked list represent the connection the head node (node in graph) is connected to. Example from diagram above, Note that a is connected to d, b, c. Look up a in the array and note that a's linked list has all of its connections.

Graph to Adjancency Matrix

Graph Structures

I have always wondered how to represent Graphs when programming. The answer came in the form of a Adjancy Matrix. This is a incredibly neat way to store all the nodes and represent the edges between two nodes.

Below is a diagram of how it works:

Essentially, rows and columns represent the node 1..4 (from right to left) and 1..4 (up and down). Everytime you have a connection you flip the bit in the matrix to 1 from 0. So there is a path from 1 to 4 the data point at M(1,4) and M(4,1) will be 1.

Monday, November 12, 2012

Red Black Trees

So, you can get screwed if you have to construct (insert operation) based on Sorted Data. You can get a highly unbalanced tree. How to solve this??

Awesome explanation taken from SO( original post ) :

Red Black trees are good for creating well-balanced trees. The major problem with binary search trees is that you can make them unbalanced very easily. Imagine your first number is a 15. Then all the numbers after that are increasingly smaller than 15. You'll have a tree that is very heavy on the left side and has nothing on the right side.

Red Black trees solve that by forcing your tree to be balanced whenever you insert or delete. It accomplishes this through a series of rotations between ancestor nodes and child nodes. The algorithm is actually pretty straightforward, although it is a bit long.

The implementation is also not really so short so it's probably not really best to include it here. Nevertheless, trees are used extensively for high performance apps that need access to lots of data. They provide a very efficient way of finding nodes, with a relatively small overhead of insertion/deletion.

While BSTs may not be used explicitly - one example of the use of trees in general are in almost every single modern RDBMS. Similarly, your file system is almost certainly represented as some sort of tree structure, and files are likewise indexed that way. Trees power Google. Trees power just about every website on the internet.

Wednesday, November 7, 2012

BackTracking

The simplest explanation of this technique just sunk in for me so I thought I would put it up before I completely forget it.

So, it works like this....

Think about having to go down a decision tree (a highly branched tree.

Now there are a few possible answers and they are at the leaf level. And imagine you are starting at the root level.

So, the way you would "backtrack" is to start down a branch (branch picking to start the problem is a bit beyond me). But you would continue down from one branch to another until you get to the leaf (most examples I saw do this recursively). If the leaf is the solution you are done! If not, you will back up, I think of rewinding, back to the original position and trying down another branch.

Now obviously this is a super simple understanding of a complex technique. Things that should be given some more thought include keeping track of position so that you do go down a failed route again and again.

PPT of BackTracking << Helped me in getting a better idea of how this works.

Tuesday, November 6, 2012

String Matching with Hash

Pretty clever algorithm that illustrates doing this with hash:

Rabin-Karp
http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

But worse case scenario, not likely to happen, is that every hash collides and then you have a quadratic time.

Monday, November 5, 2012

Selecting a datastructure

When faced with selecting a data structure:

Look at the operations to perform closely.

INSERT, SEARCH, SORT

What is likely the most common operation? READ ONLY, WRITE (MOSTLY), etc

Examine set of data structures to use.

Think about operational impact: IO, memory, etc.

http://www.cs.sunysb.edu/~algorith/video-lectures/2007/lecture5.pdf

Binary Search Tree superior in most operations.

Tuesday, October 23, 2012

Over Estimation of big O

It is likely that we can over estimate the big O and get stuck with the upper bound when in fact the average case is more important.

Dynamic array example. Upper bound is n log n but average case is n.

Monday, October 22, 2012

Oh Oh O!

Log N is essentially how many times you halve n.

the log is the power (the exponent of the base number).

OR

The number of times you double 1 to get to n.`

We do not care about bases. The behavior matters more.

========

http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/

O(1)

O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set.

bool IsFirstElementNull(String[] strings)
{
 if(strings[0] == null)
 {
  return true;
 }
 return false;
}

O(N)

O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.

bool ContainsValue(String[] strings, String value)
{
 for(int i = 0; i < strings.Length; i++)
 {
  if(strings[i] == value)
  {
   return true;
  }
 }
 return false;
}

O(N²)

O(N²) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N³), O(N⁴) etc.

bool ContainsDuplicates(String[] strings)
{
 for(int i = 0; i < strings.Length; i++)
 {
  for(int j = 0; j < strings.Length; j++)
  {
   if(i == j) // Don't compare with self
   {
    continue;
   }

   if(strings[i] == strings[j])
   {
    return true;
   }
  }
 }
 return false;
}

O(2^N)

O(2^N) denotes an algorithm whose growth will double with each additional element in the input data set. The execution time of an O(2^N) function will quickly become very large.

Logarithms

Logarithms are slightly trickier to explain so I’ll use a common example:

Binary search is a technique used to search sorted data sets. It works by selecting the middle element of the data set, essentially the median, and compares it against a target value. If the values match it will return success. If the target value is higher than the value of the probe element it will take the upper half of the data set and perform the same operation against it. Likewise, if the target value is lower than the value of the probe element it will perform the operation against the lower half. It will continue to halve the data set with each iteration until the value has been found or until it can no longer split the data set.

This type of algorithm is described as O(log N). The iterative halving of data sets described in the binary search example produces a growth curve that peaks at the beginning and slowly flattens out as the size of the data sets increase e.g. an input data set containing 10 items takes one second to complete, a data set containing 100 items takes two seconds, and a data set containing 1000 items will take three seconds. Doubling the size of the input data set has little effect on its growth as after a single iteration of the algorithm the data set will be halved and therefore on a par with an input data set half the size. This makes algorithms like binary search extremely efficient when dealing with large data sets.

Tuesday, October 9, 2012

Knap Sack

If the set is {1,2,3} how do I get target 4.

No algorithm that can solve it correctly.

Look hard to find counter examples.

Traveling SalesMan

I love how ferociously people try to solve this problem (that do not have an algo background, and I include myself in this group of people). It is a sad realization that the only real solution is exhaustive search and that solution becomes unfeasible in industry exponentially fast.

Incorrect:

Use small examples

Think about ties

Think about extreme cases

Think about counter examples from the ones that fit.