sarah « the evolving ultrasaurus

As I think about developing new Internet-connected software, I worry about the safety of the people who use it. By 2021, most Web browsers won’t allow native code extensions, which will eliminate a lot of potential issues, while a hug swath of creative animations and interactives will disappear from the Web. I spent some time this summer, thinking about what I could learn from the security vulnerabilities in the Flash Player that has been much maligned in recent years.

Flash CVEs (2001-2009)

I looked at the Common Vulnerabilities and Exposure List (CVE List hosted by Mitre with all reports 2001-2019. I found 1172 Flash Player vulnerabilities, which sounds huge, but in context of vulnerabilities reported in Web Browsers, doesn’t look that bad:

1172 Flash Player
1999 Internet Explorer
2033 Chrome
2442 Firefox

Note: these numbers don’t necessarily tell us that Firefox had more vulnerabilities than Internet Explorer. It could mean that Firefox was more rigorously open in reporting vulnerabilities, which seems likely.

Understanding attack vectors

Vulnerabilities in the Flash Player were particularly dangerous because Flash was installed on all of the Web Browsers, so any flaw in Flash was much easier to exploit than a flaw in a specific Web Browser. To understand this, one needs to understand that the primary attack vector enabling a hacker to take advantage of a vulnerability in Flash Player was to create a malicious Flash application or movie that would distract the user while doing something illicit or intentionally trigger a crash and then exploiting that crash to execute native code with access to the user’s machine.

In the larger context of a specific attack, a vulnerability in the Flash Player would typically need to be combined with something else:
* deceptive emails (aka phishing)
* deceptive websites
* “man in the middle” attacks (replace real Web content with malicious content that appears identical)

Categorizing vulnerabilities

I conducted a rough cut analysis of matching terms by reading the list of CVEs and creating categories that might provide instructive value in thinking through how to avoid similar issues in the future.

pie chart illustrating data in table below

802 memory safety
42 other code execution
58 XSS, CORS, CLRF
61 parsing / validation
13 clickjacking
91 bypass sandbox
105 other

Memory safety (~70%)

The vast majority of issues (“memory safety”) resulted from coding errors, which can now be avoided with modern programming languages. For a long time, we’ve been able to use languages like Erlang/Elixir, Java, Python, Ruby, and Go for server-side coding with memory safety features. Even C++ has language features and libraries (though you must choose to use them). Now, for low-profile client software we can use Rust or WebAssembly when we need something higher performance or less memory-hungry than JavaScript.

Escaping the “sandbox” (~15%)

If we develop code that runs in a Web browser, we can trust the browser’s “sandbox” — our apps can only use a restricted set of APIs. If we’re writing a Web browser or any other Internet-connected software used by humans or machines, then it is a good idea to carefully isolate our code that can access the operating system to write files or make network calls.

From my CVE analysis, coding errors in this category resulted in just over 15% of CVEs (other code execution, bypassing sandbox, and XSS, CORS, CLRF issues). Of course, the biggest thing you can do is not include the code that does powerful things you don’t want to allow. However, sometimes you do need to load and execute a shared library, accessing the filesystem and the network.

Parsing / Validation (~5% / ~15% excluding memory safety)

Parsing and validation of input (mostly reading a file or parameter) is another common coding error pattern which can result in a serious vulnerability. Having to fix these kinds of issues causes me to be very careful when adding parsing code to any app or library. If we exclude memory safety errors, parsing and validation errors are larger than any identified class of error.

pie chart excluding memory safety shows 16% parsing / validation

Clickjacking and “other”

Clickjacking is noteworthy for anyone developing a client app with extensions where 3rd party developers (or other users via content sharing) can present information to the user and allow interaction. This class of attack uses features that are designed to empower users to present compelling content to be instead used to trick people into doing something unintended. For example, there were bugs that allowed Flash content to overlay other web pages or browser UI, thereby tricking the user into clicking or typing in a way to provide privileged access.

Perhaps “other” deserves a closer look, but I didn’t find clear patterns and suspect that contains many smaller categories.

Parsing is hard

In my experience, many programmers recognize that implementing an extension mechanism that allows for user interaction or providing a “sandbox” for 3rd party code can be very tricky to get right and will exercise great care in writing or using that kind of code. However, I have often interacted with programmers who don’t seem to believe that writing code to parse text is difficult. Writing code that performs the intended action is not hard, but writing code that has no unintended effects requires very careful coding and a little imagination.

Looking toward open source code for some examples to learn from, here are a few examples of URL parsing libraries where bugs were found (and fixed) after vulnerabilities were discovered in the field:

https://github.com/envoyproxy/envoy/issues/7728 (Envoy proxy)
https://go-review.googlesource.com/c/go/+/189258/ (Go)
https://www.cvedetails.com/cve/CVE-2018-3774/ (url-parse Node library)

The results of this analysis were included as part of Code Mesh LDN 19 talk, A landscape of unintended consequences (video, slides). The data and methodology is available at on github: ultrasaurus/flash-cve-analysis.

RTMP: web video innovation or Web 1.0 hack… how did we get to now? (Demuxed 2019)

It was fun to go back in time and recall why Flash was great in 2000, when IE 5.5 had just been released and you couldn’t rely on CSS actually working. In prepping for this talk, I worked really hard to try to express what Web development was like then and why people loved Flash: “200K of cross-platform goodness.” Flash made the Web work for high fidelity interactive graphics 20 years ago, which I think helped drive Web standards to support more than text, images and links.

“We wanted to support all the people on the internet.” It still boggles my mind how we could support low-latency way-back-then and now when computers and networks are faster it seems impossible… sometimes I try to visualize what is happening to the bits as I wait for something to happen. (I really do know why things are slower now, and it’s not just about the tech, but that’s a different story.)

HTTP tunneling worked much better than you would expect… sometimes it’s good to make something work for everyone, even if not optimal.

I fondly remembering Doug Engelbart telling me that his Augment system (built in the 1960s) had more features than Flash Player 6 + Flash Communication Server in 2002. (It’s great when your heroes tell you that your great accomplishments are not all that interesting.) Later, he did acknowledge that making this stuff available to everyone on the Web was “pretty good.” Inspiring widespread adoption, creating an ecosystem is a different kind of innovation.

The thing that makes it an ecosystem is that each essential component can be bought from multiple companies and is available as open source. At first Flash was essential, now much later, Flash doesn’t really matter anymore to the relevance of the RTMP protocol.

Today there are 500M IP camera on the Internet, about the number of people on the Web when Flash Player 6 was released. SmartHome video sensors have insane growth. The future of video is not about how to catch up with latency and resolution of live broadcast TV (though that will happen), it’s about how we can be integrate video streams from new devices, how we can help the machines help the people by creating new applications. Maybe RTMP will be a part of that, what do you think?

As a newcomer to Rust, my suggestion for 2020 theme is to fulfill the promise of “empowering everyone to build reliable and efficient software” by finishing what’s started (rather than adding new features), continuing the focus on good docs and good tools, and expanding to develop a coherent ecosystem.

Rust empowers you to reach farther, to program with confidence in a wider variety of domains than you did before. — Rust Language Book forward

Overview themes

2020 roadmap: finish what’s started, fulfill the promise

2021 edition: scalability – Can newcomers to Rust create a real-world, complex system without recreating basic components or contributing to the language itself?

more scalable systems written in Rust
experienced C/C++ engineers can easily transition to Rust
more scalable ecosystem
- commonly needed libraries are available
- new engineers can easily become contributors

Keep doing

Tooling is great! rustup toolchain, feature flags, online/offline docs make it easy to experiment with new Rust/crate features, even as a relatively new Rust programmer.
Transparency (like this call for blog posts, RFC process including roadmap)
Focus on good docs & good error reporting is incredibly helpful. Keep iterating on this!

Feature requests

safety beyond memory safety and concurrency. For example: URL parsing should ~~be in std~~ have a shared implementation that supports common use cases — it’s risky for Internet apps to not have a stable, well vetted URL parser, why are there three? (That’s a rhetorical question. I know why, but don’t think there need to be. See twitter discussion [Update 12/16/2019: I’d like to see convergence in URL parsers (or perhaps a shared common library) the way the community seems to be converging around serde.]
async all the things! I think this is already the plan. I look forward to async I/O (network and files) to be supported in the std library. I appreciate the thoughtfulness about safety, factoring out useful core concepts (like Pin/Task), and ensuring compatibility with Futures and Tokio crates. Consider other async use cases: GPU, OpenGL
lifetimes visualization would accelerate learning curve on resolving compiler errors, good ideas in this thread

Slow down to speed up

In my experience writing documentation often uncovers design issues and bugs. RFC template has a guide-level explanation section, which is great, and taking that one step further to writing baseline docs before declaring a feature “stable” would create positive pressure for community focus. Some ideas for process improvements…

A crisp “definition of done” could help focus the community. Consider adding requirements to releasing ‘stable’
- RFC updated to reflect what was completed and is still open
- stable reference docs are complete or include link to RFC
Consider WIP limits: how limiting work-in-progress increases speed

It seems in keep with Rust values to create a strong incentive to support contributing writers who are working to take the feature over-the-line and encourage new engineers to contribute. It is easier for new contributors to work with APIs that are documented or clearly dive into a work-in-progress, aware that they are contributing to finishing something.

Other Improvements

Documentation shouldn’t require deep knowledge of Rust (example: https://stackoverflow.com/questions/56402818/how-do-i-interpret-the-signature-of-read-until-and-what-is-asyncread-bufread-i/56403568#56403568)

Background

The reason I’m learning Rust is that I am experienced engineer with a need to write performant, low-profile client/server code. I’m excited about the idea of writing one body of code that can (potentially) work across native desktop, mobile, servers… and with cross-compilation to WebAssembly (Wasm), also browsers and edge servers.

Arguably, C works for all my needs, it even cross-compiles to Wasm. I want to like Rust better. I do in theory, but in practice, it’s got a lot of sharp edges (which is saying a lot when comparing it to C).

Updated Dec 2019 to modify the introduction, so excerpt is useful (if rust2020 summary post is ever updated. Original text below:

answering the Rust programming language call for blog posts as input to the 2020 roadmap

Caveat: I am new to Rust. There’s probably stuff I don’t even know about that is more important than anything here. This is just me doing my part to give back to the awesome Rust community.

the evolving ultrasaurus

Sarah Allen's reflections on internet software and other topics

Author Archives: sarah

memory safety: necessary, not sufficient