Solving World's Greatest Mathematical ProblemsSun, 6th Feb '05, 3:05 pm::
So I spent this entire weekend trying to discover the world's first 30-digit Keith Number with no luck. It's kinda like Fibonacci Numbers. Basically, take any number and write out it's digits. Say we pick the number "123" and then write out 1,2,3. Now add all the digits up and put the sum at the end of the list. 1+2+3 = 6, so now the new list is 1,2,3,6. Now drop the first number and add the rest and put it at the end of the list again. Drop "1" and add 2+3+6 = 11. So now we have 2,3,6,11. Of course, you can keep doing this till infinity. Now the fun part happens when you pick a number like 197. Let's see: 1,9,7. Then 1+9+7 = 17. So we get 1,9,7,17. Dropping "1" and adding the rest we get 9+7+17 = 33. Drop 9 and adding 7+17+33, we get 57. Then next we get 17+33+57 = 107. Next we get 33+57+107 = 197 -> The same number we started with!
So if we get the same number as we started with, then it is a Keith Number. There are only two 3-digit Keith Numbers - 197 and 742. There are six 2-digit KNs - 14, 19, 28, 47, 61, 75. Try it for 47 -> 4,7,11,18,29,47! But it doesn't work for any other numbers. 48 -> 4,8,12,20,32,52... No "48" in this list. A few 4-digit KNs are: 1104, 1537, 2208, 2580 etc.
Keith Numbers are very rare and only about 90 of them are known. The largest known Keith Number is 988242310393860390066911414 (27-digits long). We don't even know if there are infinitely many Keith Numbers (though it seems so). There is no award or anything for finding new large Keith Numbers but there is a huge prize (and worldwide glory) for finding out a simple method for generating all Keith Numbers easily. Basically, the problem of finding Keith Numbers can be likened to solving the Knapsack Problem. Of course, if you solve the Knapsack Problem with a simple method, then you've just discovered a method for solving the P=NP problem which has a $1million award from Clay Mathematics Institute.
While I don't understand the whole P=NP problem as clearly as I want, I do understand it enough to be amazed by it. The biggest mystery in computer science right now is whether P is equal to NP or not. So what is P and NP? First of all, Computer Science (my primary major in college) is not about how to make PowerPoint presentations or write financial database software for the stock exchange. In the simplest terms, Computer Science is very much like pure Mathematics - studying problems and solving them using simple equations. Of course, if it was just solving math problems, then it would be called Math and not CompSci.
Computer Science normally deals with a different type of problems, that which involve a lot of calculation and especially estimating the time it takes to calculate. E.g., if someone gives you 10 different business cards, how do you sort them alphabetically by a person's name? Well one simple way is to put all the cards on a table and go through the list and pick out the ones that begin with A, then B, then C till you are done. You basically went through a list of 10 cards, 26 times (A to Z - 26 letters). Of course, you'd be a lot less than that, since everytime you pick a card, there's one less left in the pile. Anyways, this is a very simple sorting algorithm. Computer Science deals with sorting algorithms a lot.
Another thing is searching. If someone gives you 10 business cards and says find "Chirag Mehta." Well, you'll have to go through 10 cards at most to find mine. If it's a one-time thing then it's fine but if you often have to find a business card from a bunch of business cards, you're much better off alphabetizing them once, so that next time someone asks for "Steve Buscemi", you can directly jump to the "S" pile instead of going through all the cards. Computer Science also deals with searching and inventing better methods to find stuff from a bunch of stuff. This is why people love Google so much - it makes easy to find stuff, using their own special search methods. You just type in "flying pigs" and it looks through billions upon billions of documents to find pages which have the phrase "flying pigs" in it and returns those pages within milliseconds. Amazing isn't it!
So what does all of this have to do with P=NP problem and the million-dollar award? Well, after decades of research, computer scientists have found out that there are some problems like searching, sorting etc. whose answers can be solved within a fixed amount of time - seconds, hours, days, centuries whatever. Basically, if someone tells you to find "Chirag Mehta" from 1000 business cards, and you can read 10 a minute, it'll take you 100 minutes at most to find the card. If it was 1000 billion cards, it would take you 100 billion minutes. So the problem of searching has a fixed execution time and it can be very accurately estimated. Same goes with sorting.
However, there are tons of other problems out there, where we can't even estimate how long it's going to take to solve the problem, let alone solve the problem itself. For example, let's look at the Travelling Salesman Problem. Suppose you have to go on a business trip to 3 cities and you can go from any city to any city, what is the cheapest round-trip route that visits each city once and returns to your home city? In other words, if you live in Mumbai, and have to go to London, Beijing, and Los Angeles, what is the cheapest round-trip route that will visit each city once and end up at Mumbai? Of course, you'd need a list of all the airfares, say Mumbai to London (or back) = $100, Mumbai to Beijing = $50, Mumbai to Los Angeles = $200, London to Beijing = $50, London to Los Angeles = $100, Beijing to Los Angeles = $150. Let's assume the reverse fare is same in all the cases, so London to Mumbai equals Mumbai to London equals $100. Currently there is no way accurately find the cheapest route without going through all the combinations (Mumbai-London-Beijing-LA-Mumbai, Mumbai-Beijing-LA-London-Mumbai, Mumbai-LA-London-Beijing-Mumbai etc.)
So you think what's the big deal about having a simple computer go through all the 20-25 combinations and finding the best? Actually no big deal, until you increase the cities from 5 to 50. Or 50 to 5000. Then even the largest supercomputers in the world combined would take centuries to find the solution. Basically, there exists no way to predetermine how long it will take to solve the travelling salesman problem for x cities. We can't say if x = 5, then it'll take 5 seconds and if x = 10, it'll take 10 seconds. We can't even say if x = 5, then it'll take 25 seconds and if x = 10, it will take 100 seconds. Truth is, we don't know. Of course, if you are British Airways, you do want to know, otherwise how else are you going to schedule the flight route of 500 airplanes around 200 airports around the world? Even though the perfect solution doesn't exist, computers can do a pretty good job of going through a billion or so combinations and finding out the best routes. Yes, they cannot go through all the possible combinations, but in practical cases, going through a few million combinations results is pretty good, say as opposed to the 1950s when all the flight pattern/routing was done manually! Without the special flight-routing computers, the whole airline industry would be in chaos because nobody would be able to schedule flights between cities where there is most demand, most people, highest airfare etc.
While the airlines are using computers to plan flight routes, they are doing it as best as they possibly can, not absolutely the best. It's like me going out and asking 100 girls on a date and deciding to marry the one I think is the best. This doesn't mean she is "the one" because I did not ask the other 3 billion girls. It just means she's the best of the ones I asked. Similarly, airlines plan their flights out of the 10 billion combinations instead of the 10000000 trillion possible options. Still 10 billion is better than 100,000. And yet, it is not as good as 10000000 trillion. So there is absolutely a LOT of room for improvement.
Another similar problem called the Bin-Packing Problem, which is closely related to the Cookie-Cutter Problem. If you have 10 items and you need to pack them into 3 bins, such that no bin weighs more than 100lbs, what is the best way to do it? Of course, trial-and-error might work with 10 items and 3 bins, but what if you are the world's largest steel manufacturing company and want to make sure that you can ship the most amount of steel bars of different sizes into trucks while never putting more than 10 tons per truck and of course not overflowing any truck's container. Or if you are my mom and are making cookies. You make a big rectangle piece of dough and then use a cutter to cut fancy cookie shapes. How do you make sure you use as much of the dough as possible and not waste any? Or if you are the world's biggest cookie company and have the same problem. How do you solve it?
Turns out, there is no exact "optimal" way to solve this problem in a fixed or predictable amount of time. In other words, there is no polynomial time method to solve any of these problems. And now comes the beauty of computer science -> all of these problems, from finding Keith Numbers to Knapsack Problem, to Travelling Salesman Problem to Cookie-Cutter Problem can all be solved if even one of them is solved! Oh and the game of Minesweeper also falls into this list. So if you can find a perfect way to beat Minesweeper each time, guess what... you've just solved the world's greatest computer science problem and would most likely get the $1mllion award.
Well I've still not explained what P=NP really means. P means that the problem is easy to solve. Easy doesn't mean you can do it in your head. Easy means it is possible to do it in a fixed time, like searching, sorting etc. NP means that a solution to a problem is easy to verify. That is, if someone gives you a Keith Number, you can easily check in a specific amount of time whether it is a Keith Number or not. Or whether the flight-route from Mumbai-London-Los-Angeles-Beijing-Mumbai covers all the 3 cities and starts/ends at Mumbai. So what we don't know, is whether problems that are easy to verify can be solved easily or not. That is whether NP = P. Currently there is no known method for solving any of the problems I stated above in a fixed/predetermined amount of time.
Most scientists believe that P is not equal to NP and a lot of them believe that given the current state of knowledge in computer science and mathematics, we can't really answer this question. We need to learn a lot more and look at this P=NP problem from the different angle in order to solve it. Maybe that's where computers that work using Quantum Mechanics could help. I don't know much about Quantum Computing so I guess that I can take care of next weekend :)
Hope you enjoyed my little computer science introduction course. Oh and according to my calculations, looking for a 30-digit Keith Number involves going through about 10^30 (10 followed by 30 zeroes) numbers. Currently, I can verify over a hundred 30-digit numbers per second, about 10 million a day. That's 10^7. So it will take me 10^30/10^7 = 10^23 days to find all the 30-digit Keith Numbers. Suppose a million people join me in finding the 30-digit KNs, and 1 million computers run my software, it'll still take 10^23/10^7 = 10^16 days. Which is about twice the age of the Universe! Of course, if we involve all the 1 billion computers on this planet, and instead of doing 10 million a day, find a way to speed it up to 100 billion a day on each computer, then it'll take 10^30/10^11/10^9 = 10^10 days, which is still approximately 27 million years.
So how the hell am I gonna find a 30-digit Keith Number (if it even exists)? By pure luck :) My program just randomly generates 30 digit numbers and tests to see if they are KN or not. It has gone through about 8 million 30-digit numbers already. Of course that's like looking into a bucket of seawater from the Florida coast order to find the pearl necklace you threw into the Indian Ocean a million years ago. There's a VERY high probability that you won't find anything, but hey... who knows, you just might!
Oh and also I think there's more chance of me winning the World's Largest Powerball Lottery than finding a 30-digit Keith Number, even though I haven't even purchased a lottery ticket! But I guess it's good for my computer since it's getting some good exercise now :)