The paper [1] claims some impossibility results for the task of obfuscating computer programs, and it has generated some discussion, and some confusion (e.g., see [2]). In this essay I try, as one of the authors of that paper, to explain (my opinion of) what these results mean and what they do not mean.
I don't want to elaborate too much as to why are people interested in obfuscators and why would someone think they might exist. I will just say that obfuscators can be very useful for many applications, in particular software protection and digital rights management (DRM), and that I believe that anyone who has ever tried to figure out someone else's code can understand why obfuscators may exist (at least human ones... :) ). For more discussion, see the paper [1].
I like to divide all cryptographic constructions, be they encryption, digital signature, hash functions, or obfuscators, into two categories, depending on the level of security they posses. I call these categories well-defined security and fuzzy security.
Well-defined security. A good example for well defined security is digital signatures. Today, the standard definition for secure digital signature is existential unforgeability. Roughly speaking, this means that even if a hacker gets access to signatures on many messages of her choice, she will still not be able to forge a signature even for a single new message. Now this definition seems quite stronger than what is necessary. Indeed, in most applications where digital signatures are used, it is unlikely that a hacker will be able to obtain signatures to messages of her choice. Also, forging a signature on an arbitrary message, would not really help her break the system, unless this message is meaningful. For example, in some applications it will be useless for her to forge a signature on a message that is not in English, while in others it will be useless for her to forge a signature on a message that is not in well-formed XML syntax. Even though this existential unforgeability definition is quite strong, it is considered to be the ``right'' definition of security, because it allows us plug in digital signatures in various applications, and rest assured that the type of messages that are used in the application, be it English, French or XML, will not cause a security problem. Indeed, constructing secure applications is complicated enough without having to worry about digital signatures with subtle weaknesses. Also, there are in fact several known constructions that are widely believed to satisfy this strong definition. It's true that none of them are unconditionally proven to satisfy it (yet), but for some of them we can show that if there exist a forger then there exist an algorithm for a known hard computational problem (such as the problem of efficiently factoring large integers).
Relationship with proven security. Although these notions are related, well-defined security should not be confused with proven or provable security. The well-defined adjective refers to the cryptographic concept, such as digital signatures, public-key encryptions, or collision-resistant hash functions. The provable adjective refers to a particular implementation such as the Cramer-Shoup or RSA cryptosystem, and means that there exists a mathematical proof that shows that if someone can break that implementation, then there is an efficient algorithm for a well known hard computational problem. Of course, if the cryptographic concept is not well-defined, then there can not exist such a mathematical proof (since the security property is not a well-defined mathematical statement). Thus having a well-defined cryptographic concept is a necessary condition for proven security.
Fuzzy security. Fuzzy security is actually the notion used by cryptographers throughout history until the last few decades. By fuzzy security I mean the following process: some guy comes up with some sort of cryptographic algorithm (let's think about an encryption scheme, but one can also think of other examples, such as hash functions or obfuscators). He then makes some vague claims about the security of this algorithm, and people start using it for applications of national or personal security of the utmost importance. Then, someone else (a hacker) manages to break this algorithm, usually with disastrous results to its users, and then the inventor or users either "tweak" the algorithm, hoping that the new tweak is secure, or one invent a new algorithm. The distinguishing mark of fuzzy security is not that it is often broken with disastrous outcomes. This is a side effect. The distinguishing mark is that there is never a rigorous definition of security, and so there is never a clearly stated conjecture of the security properties of this algorithm. Another common mark of fuzzy security is keeping the algorithm a secret. This is indeed unsurprising - if you don't know exactly what is the security of your algorithm, and you don't really understand what makes it secure, then keeping it secret seems like a good way to at least prolong the time it takes until a hacker finds a bug in it.
I want to stress that I do not intend to discredit the cryptographers throughout history that constructed "fuzzily secure" algorithms. Many of these people were geniuses with amazing intuitions, that paved the way for modern cryptography. However, this does not mean that we need to use their algorithms today. Today for most cryptographic tasks we do not need to use fuzzy security, since we have very good security definitions, and constructions that are reasonably conjectured to satisfy these definitions. One exception is indeed obfuscators, where so far no one has come up with a good definition for the security properties of obfuscators. Also (and this is not a coincidence) progress in obfuscator research seems to be of the sort mentioned above. One guy builds an obfusactor, an industry uses it, it gets broken, and then they build a new obfuscator (and/or try to pass laws making breaking obfuscators illegal..)
/********************************************** * Hopeful adder - * * Usually returns the sum of its two inputs. * **********************************************/ int hopeAdd(int x, int y);I wish to argue that it actually makes much more sense to use the hopeful adder function in your code, than to use a "fuzzily secure" cryptographic module. The reason is that you can test the normal-user inputs, but you can not test an hacker attack. By this I mean the following: if you use the hopeful adder in your code, and run it through many hours of beta-testing, and you never come across an input on which it fails, then it is indeed likely that this function will almost never fail when your application is used normally. However, if you use a "fuzzily-secure" cryptographic module in your application, and it does not get broken in the beta-test phase, this does not mean anything about the real-world security of your application. This is because a hacker will most likely attack your application in ways that were not predicted in the beta-testing phase, and will intentionally feed your application with the exact "weird" inputs that cause your module to fail. This is why in my view well-defined specifications are actually much more important in security than in other areas of software engineering. Of course, as all programmers know, using rigorously specified components does not guarantee that the overall system will be secure. However, using fuzzily specified components almost guarantees insecurity.
Aren't all systems insecure anyway? It's true that probably all large systems have security bugs, just as all large systems have other bugs as well. However, if one is working with well-defined secure components, it is possible in principle to build secure systems, just as it is possible in principle to build bug-free programs. The only problem is that it is very very difficult to build such "perfect" systems that are large. In spite of this, with time, and with repeated testing and scrutiny, systems can converge to that bug-free state (assuming that this time is indeed spent on fixing bugs and not on adding features - perhaps the best example of this is Knuth's TeX program). Such convergence cannot happen if one is using fuzzily secure components.
Definition 1: A compiler satisfying the functionality and efficency conditions is an obfuscator if for every program P that is hard to learn from its input/output behavior, a hacker can not reconstruct the source code of P from the obfuscated version of P.
The counterexample I mentioned above shows that Definition 1 is impossible to meet. This means that if one wants a well-defined and meaningful notion of obfuscation, one needs to come up with a new definition for obfuscators such that (1) the new definition is weak enough so that the existence of an obfuscator meeting it will not be contradicted immediately by our counterexample, but (2) the new definition is strong enough to provide some meaningful sense of security.
Now, the question of making obfuscators with well-defined security becomes the question of coming up with a definition satisfying these properties (1) and (2). It seems to me that there are two natural ways to get such a definition:
Thus it seems that the paper [1] does present very serious challenges to anyone wishing to promote the state of obfuscators from its current "fuzzy security" level, to well-defined security. As a final note I want to mention that there are several interesting concepts related to obfuscators, and for many of them we have more open questions than results. The interested reader is referred to [1] for more details.