Kaspersky researcher Costin Raiu says the NSA couldn’t have done it without the source code.
What?!!
The contention that the NSA definitely had access to the source code is not only patent nonsense, it ignores that fact that Kaspersky themselves supposedly didn’t have the code. Having the source code is the easy way, perhaps the preferred way, but it’s hardly the only way.
A Reuters article speculates how the NSA might have obtained the source code and indeed, one of those is a likely scenario. But it’s also feasible to do the job without the source and I’ll show you what I mean, a technique I used to unravel computer fraud programs. Fasten your seat belt because this is going to get technical.
World’s Greatest Puzzle
Those around in my Criminal Brief days know that I love puzzles. For me, the ultimate puzzle has been systems software programming, making the machine do what I want. But sometimes I’ve come up against puzzles, some benign, some not, where I didn’t have the source code.
Let’s try an example. What if we found mysterious code in our computer that looked something like this:
Mysterious Snippet of Computer Code |
If you can’t make sense out of this, you’re not alone. 98% of computer programmers wouldn’t know what to make of it either. But if you look closely, the data populating the upper block looks different from that in the lower block. This is a clue.
Unlike commercial and scientific programs, systems software deals with the operation of the computer itself– utilities, communications, and especially the operating system. The realm of a computer’s internals are abstract, far more so than the Tron movies. Key aspects seldom relate to real-world equivalents. Sure, we say that RAM is a little like notes spread out on your work table and that disc storage is kinda sorta like a file cabinet… but not really. Even the term RAM– random access memory– is misleading; there’s nothing random about it.
Back in the real world, let’s say you want to write a simple program that adds the number of apples and oranges. In most programming languages, this code would look like this:
Internally, a program loads apples and oranges into registers (kind of like keying them into a calculator), adds them, and stores them in a variable called total. If we were to write this in the argot of the computer, we’d use assembly language mnemonics, an abstraction of the computer’s machine language. Deep, deep down in a program, we’d see nothing but numbers where we count…total = apples + oranges
Yes, A-F are digits in this context. Within the computer, our little program above might resemble…0, 1, 2, 3, 5, 6, 7, 8, 9, A, B, C, D, E, F
total = apples + oranges |
What isn’t obvious to many programmers is that computer instructions are data. Indeed, some black-hat crackers (the bad guys) have used this property to sneak malware onto unsuspecting computers.
If you look again at the original sneak peek of data, you’ll start to see patterns and may even pick out the machine instructions from our code example above.
Less Mysterious Code Snippet |
This puzzle solving is called reverse engineering. It’s possible to write a program called a disassembler (I have) or a de-compiler (I haven’t) to decode the machine language into something more intelligible. The program has to be smart enough to not only separate actual data from instructions, but distinguish the type of data.
As you see, compiling source into binary executable code isn’t a one-way street. With dedication and know-how, reversing the process is well within reach.
How safe do you feel now?