April « 2009 « the evolving ultrasaurus

Josh Susser, organizer of the Golden Gate Ruby Conference, introduced this talk by saying that he wanted to have talks that weren’t only about how we do what we do, but also about why we do what we do. Gregory Miller’s talk “Trust the Vote: An Open Source Digital Public Works Project” addressed a significant issue for the United States and for democracy in general, and spoke to why the open source process that is so alive in the Ruby community is critical to solving many of the problems of our day.

Here in the U.S., we have some serious problems with our voting system:

No Federal Guidelines about how votes are counted
No assurance in California that absentee votes are counted
Problems with the voting machines and the companies who make them
- Most voting machines are written on Windows95
- 4 vendors creating voting systems in the US, may be 2 by the end of the year.
- Very high barrier to entry: certification costs millions $, this is actually a dis-inventive to innovate
- Natural conflict of interest: shareholder interest collides with public interest. The companies have feduciary responsibility so the shareholder interst always trumps public interest.

The Open Source Digital Voting Foundation is seeking to fix this “critical democracy infrastructure.” We should consider it a “digital public works project” since it is so imperative, creating transparency by using an open source approach. They are creating an endowment to support a public technology repository.

“Sunlight is the best disinfectant”

Overview:

Dev process: core team essential for continuity
RFC Services: similar to IETF process
Design Congress: state elections directors arround the country in a virtual community to drive the business requirements
Federal Certification undertaken by the non-profit

All the software will be dual license; Public Development License & Commercial Deployment License, so that it can be easily adopted by corporations. The goal is to create transparency by using an open source aproach, and actually build things that we can see, touch, and try.

Major work areas (*=Ruby on Rails projects):

Digital voter registration system*
Ballot design*
Ballot casting and counting
Election management* – back office web app for supporting the admin tasks of an election, including district data
Operating system platform: they are building on “commodity” hardware and components, but for some customer who are seeking additional security features they are collaborating on open source hardware with Intel and AMD

A good portion of the work is Rails-based, with Pivotal Labs as a development partner. He also noted that they are in the process of putting together a “core team,” recently joined by Alec Totec, one of the original Netscape engineers (a very smart, practical guy who I had the opportunity to work with in ’95 tracking down bugs in the Netscape Plugin API when I worked on Shockwave).

After the talk, I got the chance to speak to John Sebes, OSDV CTO. He noted which projects are being implemented with RoR (see * items above). Some of the web apps they are buiding will solve fairly simple technical problems, but they answer a huge need. The folks who run the elections generally work with very poor quality software with awkward UI that can lead to errors. For example, one might think that putting together a ballot would be straight-forward, but there are countless examples of very basic design flaws, which could be remedied by some relatively simple, effective software. He told a story of the election of Rush Holt, who was fortunately uncontested, yet the ballot made it very hard to tell the intent of the voter:

I can imagine all sorts of ways that ballot design in general could be improved for usability in addition to fixing outrageous bugs in the system like the one illustrated above. As a voter and open source developer, I am very excited about this project.

There are lots of ways to get involved. Join their new facebook group to stay in touch.

We just heard a fantastic talk by Jacqui Maher about her work on the Boabab project, fighting AIDS in Malawi, Africa.

First, she gave us an overview of the AIDS epidemic, especially in Africa:

Africa has 12% of the world’s population, but 60% of the people with AIDs
In Malawi
- 14% of adults have AIDS
- 8 people die every hour from aids
- there are 280 doctors
- 3500 HIV/AIDS patients per doctor

When she arrived, patients would wait in long lines to see a doctor and patient intake would typically take 15 minutes. It was all paper-based an error-prone. In Malawi, they have a national id program where every ID card has a bar code. This could be used for easy patient intake. After they developed the hardware/software solution, it would take less than 1 minute to register new patients and less than 10 seconds for returning patients to get through the intake process.

The solution was designed to help in a number of areas:

Patient Registration: entering new patient data, generate national id bar code, or scan an existing one
Encounters: any patient interaction
Observations: diagnosis, progression, vitals, patients complaints, drug regimen
Prescriptions: drugs, ingredients, inventory, etc.

They overcame challenges with spotty internet connections and low bandwidth. They use a wireless mesh network, which is self-healing. The portable computer they used was based on the I-Opener (initially bought from the US on eBay, then 2000 were donated) which was hacked to include a touchscreen, ethernet, PoE (power over ethernet) and a bar code scanner. The software is Ubunto, Ruby on Rails, and MySQL.

More details on the software:

BART – Baobab Anti-Retroviral Treatment
OpenMRS Data model
templating using ERB
App calls via AJAX
Rspec tests

Jacqui told a great story about Gem the Janitor (yes, that is his real name) who just picked up the device during a busy time when all of the nurses were busy, figured out the interface quickly and started helping register people. Now he runs the whole intake process.

Why RoR?

great community
common consensus on best practices
active contributions to OSS
very accessible information on every part of the stack
supurb interactive tutorials like peepcode
Ruby is easier to learn offline that other languages, comes with documentation
ActiveRecord: makes complex data models easier

Now 265 of the 280 doctors are using this app. The data collection enables extensive reporting, enables agencies to use the data to focus research & funding, and influence policy decisions.

You can help!

http://github.com/baobab/
the developers are on IRC: freenode #baobab
more info: www.baobabhealth.org

Nick Kallen, Twitter, author of popular open-source projects such as NamedScope, Screw.Unit, and Cache-Money, gave a compelling talk yesterday at the Golden Gate Ruby Conference. Nick’s easy-going presentation style and thoughtfully prepared examples made a complex topic easy to follow. Nonetheless, my notes are sparse since most of my attention was devoted to listening and absorbing.

“Your website is a distributed network service.”

While the talk was entitled “Magic Scaling Sprinkles,” Nick dispelled the idea that any magic technology would solve scalability problems; instead, he shared some fundamental concepts of computer science underlying scalability:

distribution
balance
locality

Two important concepts: throughput and latency

For example: 1 worker is able to complete a job in 1 sec. 1 job/sec is the throughput (number of jobs per unit time). 1 sec is the latency (elapsed duration from start of job to the end of a job). Latency is an efficiency question. Throughput is a scalability question. Focus of this talk on scalability.

Nick wrote a very simple echo server, ran a load test. Then added:

100000.times { Time.now }    # represents an intense loop. memory alloc + system call
sleep rand *3  #an efective representation of blocking i/o

complete code is on github

How many can we run in parallel? How many can we run per machine? How many can we run per core?

The code uses a statistics library, statosaurus, that they use at twitter (Nick’s github example contains a version of statosaurus which he says contains the key parts of the proprietary twitter package). Recommendation: log everything extensively, thread transaction ids throughout your logs. Essential for tracing down failed distributed transactions. (In SQL queries, HTTP headers, etc.)

For this example, he logs the following:

a time stamp
a transaction id
wall-clock time: amount of elapsed real time
system time: amount of time the process has spent in the CPU while in kernel mode
user time: amount of time the process has spend in the CPU while in user mode

Note: system time + user time < wall-clock time Since there is wait time (simulated by sleep) or if there are too many process on the machine at the same time for the number of cores, so your process is waiting in the “run queue.” That latter excessive context switching is what we want to investigate.

Generally if we take the wall-clock time and divide by (system time + user time) we get the optimal number of processes per core. This leads us to a distribution strategy.

Nick talked about different mechanisms for distribution: TCP Proxy, DNS (compelling for a chatty protocol), client (has some serious drawbacks for maintenance/upgrades). In this case, proxy is an optimal solution.

Use a strategy of “least connections” aka “by business” which is more effective than round robin.

True efficiency: never do the same work twice.

Locality: analogy to tape drive, where if you write close to where you last wrote or read, then it will be significantly faster due to spatial locality. The same applies to hard drives and databases.
Cache is a spatial locality that keeps the data near the code. Put the requests on processes where the data is most likely to be cached. Sticky sessions can be an essential technique.

the evolving ultrasaurus

Sarah Allen's reflections on internet software and other topics

Monthly Archives: April 2009

open source digital voting software

using ruby to fight aids

magic scaling sprinkles