Nick Kallen, Twitter, author of popular open-source projects such as NamedScope, Screw.Unit, and Cache-Money, gave a compelling talk yesterday at the Golden Gate Ruby Conference.  Nick’s easy-going presentation style and thoughtfully prepared examples made a complex topic easy to follow.  Nonetheless, my notes are sparse since most of my attention was devoted to listening and absorbing.

“Your website is a distributed network service.”

While the talk was entitled “Magic Scaling Sprinkles,” Nick dispelled the idea that any magic technology would solve scalability problems; instead, he shared some fundamental concepts of computer science underlying scalability:

  • distribution
  • balance
  • locality

Two important concepts: throughput and latency

For example:  1 worker is able to complete a job in 1 sec. 1 job/sec is the throughput (number of jobs per unit time). 1 sec is the latency (elapsed duration from start of job to the end of a job). Latency is an efficiency question. Throughput is a scalability question. Focus of this talk on scalability.

Nick wrote a very simple echo server, ran a load test. Then added:

100000.times { Time.now }    # represents an intense loop. memory alloc + system call
sleep rand *3  #an efective representation of blocking i/o

complete code is on github

How many can we run in parallel? How many can we run per machine? How many can we run per core?

The code uses a statistics library, statosaurus, that they use at twitter (Nick’s github example contains a version of statosaurus which he says contains the key parts of the proprietary twitter package). Recommendation: log everything extensively, thread transaction ids throughout your logs. Essential for tracing down failed distributed transactions. (In SQL queries, HTTP headers, etc.)

For this example, he logs the following:

  • a time stamp
  • a transaction id
  • wall-clock time: amount of elapsed real time
  • system time: amount of time the process has spent in the CPU while in kernel mode
  • user time: amount of time the process has spend in the CPU while in user mode

Note:   system time + user time < wall-clock time Since there is wait time (simulated by sleep) or if there are too many process on the machine at the same time for the number of cores, so your process is waiting in the “run queue.” That latter excessive context switching is what we want to investigate.

Generally if we take the wall-clock time and divide by (system time + user time) we get the optimal number of processes per core. This leads us to a distribution strategy.

Nick talked about different mechanisms for distribution: TCP Proxy, DNS (compelling for a chatty protocol), client (has some serious drawbacks for maintenance/upgrades). In this case, proxy is an optimal solution.

Use a strategy of “least connections” aka “by business” which is more effective than round robin.

True efficiency: never do the same work twice.

Locality: analogy to tape drive, where if you write close to where you last wrote or read, then it will be significantly faster due to spatial locality. The same applies to hard drives and databases.
Cache is a spatial locality that keeps the data near the code. Put the requests on processes where the data is most likely to be cached. Sticky sessions can be an essential technique.

Carl Lerche talks about how to write fast ruby code. Yes, ruby is scalable. Scaling != speed. Focus of this talk is on speed. Ruby is fast enough for the vast majority of use cases.

“Slow code is your fault.”

How can I write fast code?
1. Write slow code: well-structured code that is easy to read. Don’t worry about performance the first time around. You can’t tell from the beginning what matters.
2. Use the scientific method.

  1. Define the question
  2. Gather information: where is time/memory being spent?
  3. Form Hypothesis: why is this chunk of code slow/memory hog
  4. Perform an experiment and collect data
  5. Publish results (restart if needed)

Need questions like: “why is action X taking 600ms? ” why is 60% of a Merb dispatch cycle in content negotiation?”  Why are my mongrel instances growing to 300MB of memory”

Some useful tools:

  • RBench
  • ruby-prof to generate profile data / kcachegrind:  for reading profile data
  • explain analyze log files
  • New Relix / five runs
  • memory_usage_logger
  • Bleak_house (memory leaks)

Garbage collector is a conservative mark and sweep garbage collector.  When it runs all your code stops. Each run can take 50-150ms.  Triggers befre grabbing more system memory (every 8MB).

Avoid creating unecessary objects.  Understand the difference between Ruby methods (e.g. the difference between reverse! and reverse).

DataMapper’s identity map is pretty awesome.

Beware of modifying large strings.

Don’t concat strings just to pretty print them across lines.  Do this instead:

     s = "Here is my long string" 
           " that continues"

Beware of closures.

No code is the fastest code.  Be lazy.  Don’t run code till you have to.

“Compiling your code.” Iterating is slow.  Ruby’s AST is fast. (This is a little crazy, but sometimes you need it.)

Make sure you have great tests, then when you optimize you can make sure you didn’t break anything.

Panel discussion at Golden Gate Ruby Conference

Shoes, Tim Elliott, framework for creating GUI apps. It is an application that embeds Ruby. It is written in C. It is designed to lower to bar for programming and make it fun. Not an MVC framework. Writing an app is more like writing a script. Written to be compiled an shared with your friends.

RAD, Greg Borenstein, open source hardware platform for doing hardware hacking. RAD is a framework for programming the Arduino physcial computing platform using Ruby. The Ruby code is coonverted into C code which is then compiled and run on the Arduino microcontroller. Working on a test suite that comes with a shoebox full of hardware, so you can check if things blink or bleep in the right order to see if your tests pass.

Adhearsion, Jay Phillips, a way of building telephony applications. You call into a phone number, then the Ruby call services the phone call. First app he wrote was using RAD to make it so he could make a phone call to unlock his door — he says everyone should go out and buy an Aduino controller and a bunch of LEDs and build something fun tonight.  Interesting thing about Ruby is that it allows you to “play with other people’s code.” He described a plugin system that was actually adopted from RAD.

Sinatra, Blake Mizerany, creator of Sinatra.  Sometimes MVC is too much.  Ruby is great for this.  Closures are awesome.  As Rack grows, Sinatra has been getting smaller.  Sinatra is a really strong Rack citizen.

Merb, Yehuda Katz lead maintainer.  (not talking about Merb) Hard thing about maintaining a framework is that they start with a clear mission, but as people build apps with the framework, there are requests where its hard to tell if this request is pushing application code into the framework  The best thing about Ruby is that all code is executed code.  You can define methods anywhere.  What is hard about Ruby.  It isn’t a slow language, but nothing is free.   The challenge is how to right lightweight code, yet is robust.

Rails, Josh Peek from the Rails and Rack core teams.  … interested in seeing how code can be shared between frameworks to strengthen the  ecosystem.

What features of the Ruby language make it effective for frameworks? meta-classes and closures (e.g. ActiveRecord),  “i don’t consider languages without closures to be powerful languages” (yehuda katz), defining methods on the fly, open classes, community (grass roots, people agree that they want to share code, this is unusual, agility in the community: moving to git and github, Rack, Ruby gems, RSpec, test-driven development as part of the project),  the agility of the community attributed to the agility of the language.

Is there anything about Ruby that encourages open source? the fact that it is a scripting language.  It is hard to hide your code.  The fact that Rails and Ruby are MIT Licensed, so corporations aren’t afraid to use it.  Even if 90% don’t give back, it increases the number of people who do.  Makes it so people feel free to try stuff out and modify it (and the fact that there are tests!)  There is high level of inter-operability with the “host language” for different Ruby implementations.

“If you are writing a framework, you should be writing it in the same way that you recommend people write plugins.  It’s really hard, but you have to do this.” — Yehuda Katz

“Allowing people to write test for their plugins is essential.” — Jay Phillips

What about the proliferation of Ruby implementations in different languages? kick-ass, as long as we keep holding the implementations to a high standard of compatibility.

Why was github so successful?  Why did so many projects move to git and to github so quickly? main benefit of github is the social network aspect.  When you put your code up on github, you aren’t creating an open source project, you are just sharing your code.  This decreases the overhead.  It increases people contributions.  “I think it has totally revolutionized the way people create open source software” (Jay Phillips)  Moving to git let’s you make really large changes and merge them back.  Things that are possible in git, would have been impossible in svn — you would fork forever instead of merging back in. (Yehuda Katz)

How to get the community to move to Ruby 1.9? get Rails to be on 1.9.  Why do you want the community on 1.9? speed improvements.  Yehuda Katz:  If you are doing something computationally expensive you might want to be on 1.9.  I benchmark everything.  Usually 1.9 is 2x and JRuby is 2.5x, but 1.9 has outliers of slowness.  I don’t think there is huge benefit to the community in moving to 1.9 (but I do think it is important that we all do move forward)  Jay Phillips: when JRuby and all the gems move to 1.9, the community will.  If I switch to 1.9 syntax, I will break JRuby, and I can’t do that.