The Missing Excel Move

This post is a request for Excel devs to implement a novelty data navigation function.

I call it a ‘Levenshtein Move’.

When we review data, especially in a security field, and in particular sorted data gathered by Threat Hunters, we often come across data sets where multiple rows (often thousands, or tens of thousand) contain data very similar to each other.

At first, we obviously blame idiots who run tons of legitimate processes that use random values as a part of command line which makes our life hell, or similar idiots that make web logs full of almost identical entries, but then we still have to go through it.

For instance, if we see stuff like this:

  • hxxp://foobar.com?xyz=123456
  • hxxp://foobar.com?xyz=123457
  • hxxp://foobar.com?xyz=123458
  • hxxp://google.com

– it is only natural to want to skip the repetitive values and just go to the google one next (also see GIF embedded below).

Because once we see it, we know we can just exclude this data using a regex or a wildcard. However, that next item on the list may be 10K rows later; there is no way to find it unless you skillfully use mouse for scrolling, or use the old-fashioned PageDown brute-force attack.

Now imagine making the computer working it out for you.

You could use a Levenshtein distance between the current cell value, and the cells below. As long as the calculated distance is low, we could continue to progress. And only stop when the next cell is substantially different from the one that is currently selected:

The core code I used to demo it looks like this:

Sub Levenshtein_Move_down()
    On Error Resume Next
    c = ActiveCell.Column
    r = ActiveCell.Row
    firstcell = Cells(r, c).Value

    Do While True
        r = r + 1
        curcell = ActiveSheet.Cells(r, c)
        If curcell = "" Then
             Exit Do
        End If
        l = Levenshtein(firstcell, curcell)
        If l > 3 Then
            Exit Do
        End If
    Loop
    Cells(r, c).Select
End Sub

Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long

Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long

string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)

For i = 0 To string1_length
    distance(i, 0) = i
Next

For j = 0 To string2_length
    distance(0, j) = j
Next

For i = 1 To string1_length
    For j = 1 To string2_length
        If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
            distance(i, j) = distance(i - 1, j - 1)
        Else
            distance(i, j) = Application.WorksheetFunction.Min _
            (distance(i - 1, j) + 1, _
             distance(i, j - 1) + 1, _
             distance(i - 1, j - 1) + 1)
        End If
    Next
Next

Levenshtein = distance(string1_length, string2_length)

End Function

I borrowed the VBA code for Levenshtein distance from this post on Stack Overflow.

And yes, you can embed it in your Excel workbook and assign it to a shortcut.