/tech home / projects / personal 'blog / about


ARCHIVE: The changing definition of 'Scale'

Tue. Jun 7th 2005, 09:07pm:

When you hear a bunch of marketing folks talking about 'scalable' (interchangable with solution or enterprise), you're in a position to make or lose a lot of money. These are the people who claim that you should forget everything you learnt back in 1980's because the latest TLA technology will revolutionize the way we do business and live in a global community. While I understand that technology is changing the world every moment as it becomes more ubiquitous and transparent, I don't necessarily think that most of the technological innovations are major breakthroughs like the invention of steam engine or the electric bulb. It's probably just another way to store data on to a magnetic tape or an optical drive. Big deal.

Nevertheless, there is one aspect of today's technology that I find very appealing and in a way, somewhat surprising. Scale. Size. Ginormousness. Fortunately for us today, we don't have to start big, we don't have to invest millions of dollars in technology, and we don't really have to sell our house to buy a storefront for our small business. Just a decade ago, scale is what separated the Sears and Walmarts from garage and yard sales. Not any more; you can start relatively small and end up like Amazon, eBay, or Google, provided you have the business and technology skills of course.

I remember growing up as a naive programmer who couldn't believe that my 486 DX2 wasn't enough to run a database program for an accounting firm. I'm probably still as dumb because I don't understand why I would need to buy a $100,000 system to process orders and ship goods while making sure the financial records are kept up to date and the production schedule is optimal. And yet, just a decade ago, most medium-sized businesses spent a fortune on IBM and Sun servers to do something a few clustered Linux PCs with the appropriate software can do today. Of course, not one person who actually worked on these behemoths or better yet, made money selling them, will agree with me when I say that there is nothing a good distributed network cannot do for 1/10 of the cost of these extinct Titans.

Sure, there are some things for which you need the raw processing power or sheer storage space but once again, nothing a small or medium sized business would need. Unless of course, you are in the business of crunching numbers or data collection but even then, you're much better off doing it the Google way than keeping IBM happy. Well, IBM is certainly not happy at the moment with how things are going in the almost free world of Linux/Apache/MySQL/PHP (LAMP). Obviously, if any group of moderately skilled programmers can put together an entire database application that works over the Internet, requires minimal hardware specs., and is customized to do everything a business needs, then nobody's going to buy DB2. Sure, it' may have features that no open source database currently has but if the LAMP-style alternative meet 95% of the business' needs, why not use it instead?

The debate is obviously not this clear-cut. The folks in the big-database-big-server camp throw around buzzwords like 'Scalable' and 'Enterprise-ready' to look down upon the grassroot developers who use free and open-source tools to make applications. Sounds pretty much like the neverending battle between the big corporations and the dirty hippies; you don't want to end up on either end actually. Obviously, an Access database connected to a web-frontend is NOT scalable and neither is a non-normalized MySQL or PostgreSQL database. However, when designed well, PostgreSQL can power large companies relatively cheaper than its commercial counterparts.

So what exactly is 'scalable' you ask? Scalable is simply a measure of whether a given system can be customized to support magnitudes more of load than when it was designed for. This web server can support about five hundred users before it heats up - it cannot support 100,000 concurrent users at the moment no matter how much RAM I add. But does that mean it's not scalable? Certainly not. All I need is a load-balancing setup in the front that passes on the queries to a cluster of database servers in the back, that can be read by as many web servers as I can afford to hook them up with. Give me ten grand or so and this website can easily survive a major Slashdotting or a good Farking. The LAMP setup is most certainly scalable as explained here.

However, nothing is perfect and neither is the LAMP-style setup. (When I say LAMP-style, I mean you can replace MySQL with PostgreSQL or Firebird, PHP with Python or Ruby etc.) There are a lot of things that aren't as yet possible in Linux or Windows and can only be found in the big IBM, HP, or Sun servers. Many of the largest computers meet stringent DoD security requirements and have reached maturity after being fine-tuned for decades at the cost of billions. It is foolish to claim that an operating system designed a mere decade ago can absolutely outperform something that the smartest minds in the NASA/DoD/NSA worked on for three times as long. Of course, the big servers have a place and are definitely worth the expense when you need to make sure that trillions of dollars of stock exchange transactions happen flawlessly every day or that the power-supply for ten states is monitored and maintained without downtime.

But you don't need a $100k server to provide fifty users with Groupware (i.e. project management, budgeting, accounting, email, document management etc.) For that, there is LAMP and tons of free alternatives. And the greatest thing about the scalability of LAMP-style platforms is that when you need to go from 50 to 500 to 5,000 to 50,000 users, the costs increase arithmetically, not geometrically and definitely not exponentially.