Resume for Daniel S. Wilkerson

http://danielwilkerson.com/resume-dsw.html

Experience

Program in C, C++, Java (plus Google Web Toolkit & Google App Engine), Python (also GAE), Perl, Make, Bash, and to a lesser extent Ruby (some Rails), LISP/Scheme, SQL. Familiar with the Unix (Linux, OS X) platform but not Windows. Have published working Open Source code that some people seem to find useful (see below). Worked as a programmer on a UC Berkeley research team. Have worked independently for long periods. Can express myself in written English.

Total of 8 years, 4 months paid industry and university staff employee programming experience (one year of that was half-time) and about 3 years and 6 months as a contract programmer. In addition years of writing open source and doing Computer Science research on my own; see below.

Recent

Client: Abbott Laboratories / UCSF

June 26 - August 3, 2012; October 15, 2012 - March 1, 2016. Continuing to work with the UCSF Viral Discovery and Diagnostics Center, this time as a contractor for Abbott, which funds the UCSF lab. Work designing and writing a high-performance biological sequence alignment tool-kit; 43KLOC C++, 10KLOC tests.

University of California San Francisco Viral Discovery and Diagnostics Center

December 10, 2011 - January 10, 2012, plus further (so far unpaid) design work in April 2012 (amazingly, difficulties with the UCSF legal bureaucracy is making it quite difficult for Dr. Chiu to arrange pay us despite his heroic efforts to do so.) Helped convert a prototype for a virus analysis pipeline into a schema for an industrial-strength implementation. This was work with Charles Chiu and Narayanan Veeraraghavan of Dr. Chiu's lab and Doug Judd of Hypertable.

Independent projects

August 2006 - Present.

Harmony Explained: Progress Towards A Scientific Theory of Music: Most music theory books are like medieval medical textbooks: they contain unjustified superstition, non-reasoning, and funny symbols glorified by Latin phrases. How does music, in particular harmony, actually work, presented as a real, scientific theory of music? In particular we derive from first principles of Physics and Computation the following three fundamental phenomena of music:

Manhattan-Connected Exact-Coloring is NP-Complete: Accidentally solved an open theory problem while helping a friend with his homework (of course with the intent to provide proper notification to the course grader) who was taking Professor Richard M. Karp's CS 270 Combinatorial Algorithms and Data Structures class at The University of California, Berkeley.

Distributed Transactions for Google App Engine: (a few years off and on) Optimistic Distributed Transactions built upon Local Multi-Version Concurrency Control, a project with Simon Fredrick Vicente Goldsmith, Ryan Barrett, Erick Armbrust, Robert Johnson, Alfred Fuller. Ryan Barrett, one of the founders of the Google App Engine team, said "Thanks. This has been on our todo list for years." Erick Armbrust at Google has implemented it as part of tapioca-orm. Presented by me at CodeCon 2009 and at Google I/O 2009 (see video of me speaking).

Hard Object (several years, started around 2005): a lightweight change to the standard x86 hardware platform to provide software engineers with fine-grain memory safety; I am the lead of this project having many collaborators. Many projects have tried to do this before, going back several decades, but ours actually seems to work! Mark Winterrowd and I have worked together on it full time for about a year until he was recently hired at Coverity, but the project goes back to about 2005 an involves several collaborators at Cal Berkeley. This is a long-term project and, except for the first tech report, remains unpublished as we file patent applications. Final tech report still in preparation.

MTS: Multi-TeSter: a simple domain-specific language for maintaining tests that supports two properties I have found desirable:

A Proposal for Proquints: Identifiers that are Readable, Spellable, and Pronounceable; Open Source implementation in C and Java: http://github.com/dsw/proquint. Identifiers (IDs) are pervasive throughout our modern life. We suggest that these IDs would be easier to manage and remember if they were easily readable, spellable, and pronounceable. As a solution to this problem we propose using PRO-nouncable QUINT-uplets of alternating unambiguous consonants and vowels: proquints.

My functional, object-oriented programming language, unpublished: several months full time, suspended.

University of California, Berkeley

February 2002 - August 2006 (the last year was half-time). Title: Programmer/Analyst III. Worked with Professors Alex Aiken and David Wagner. Implementation work: mostly C++ and a bit of Perl.

Elsa and Oink: Worked on Elsa, a C++ front-end maintained by Scott McPeak. Wrote a client of Elsa called Oink that is a framework for static analysis of C and C++. The primary tool of the framework right now is a second-generation implementation of the Cqual data-flow analysis tool by Jeff Foster and others. Elsa and Oink were presented by Scott and me at CodeCon 2006. Karl Chen has used Cqual++ to find what seem to be many previously unknown remotely exploitable printf() format string vulnerabilities in Debian; the false positive rate is less than 50% at the package granularity. Elsa (but not Oink) is now used by Mozilla as part of their refactoring effort. After the Heartbleed bug came out, someone at a government lab that will not let me use their name wrote me, saying:

Yes, you are interpreting me correctly. CQual++ found heartbleed while the commercial tools I tried did not. I also applied CQual++ to an important internal project and found it very effective (though a bit difficult to run and interpret) at identifying places where sanitization routines weren't being called consistently.
and
I've been applying cqual to a fairly significant library that we've been developing that does a fair amount of both network and filesystem related stuff. I don't think I've hit any false positives. There were a number of places where input was being validated but not marked as such which of course caused issue reports but that's not really a false positive.

Worked with Mozilla engineers designing a Garbage Collection Safety analysis for use on Mozilla's C implementation of JavaScript, SpiderMonkey; implementation of the analysis remains unfinished due to unfortunate inability to agree on contract terms with Mozilla (as far as I can tell, the contract would have potentially made me personally liable if anyone had ever sued Mozilla regarding the work); unpublished, but the parts of the analysis that were finished occur in the Oink distribution.

Added to Oink/Cqual++ a feature for lightweight enforcement of privacy boundaries in C at the file granularity which requires no modification to the C code; unpublished, but the parts of the analysis that were finished occur in the Oink distribution. This analysis was designed in collaboration with Matt Harren and David Molnar.

Build Interceptor: Oink needs the pre-processed (.i) files from a project to work on; this can be hard to obtain as build processes are varied. Implemented a second (or third) generation tool based on previous efforts by Hao Chen and Ben Liblit called Build Interceptor (build-interceptor.tigris.org) that solves this problem as long as the compiler is gcc; it is used to intercept much of Debian, per our analysis mentioned above.

Delta: Scott McPeak and I wrote Delta (delta.tigris.org), a simple implementation of Andreas Zeller's Delta Debugging algorithm. Delta can minimize a quarter-million line file that crashes a C++ front-end down to a few pages of code that produces the same result and do this completely automatically. I have been told that "delta is saving a lot of gcc developers life[sic]." This was my first Open Source project and was released to the public at the request of Microsoft Research (!). Presented by me at CodeCon 2006; I was the first person in the history of CodeCon to get two projects accepted in the same year (the other was Oink).

Trend-Prof: Worked with Simon Goldsmith on Trend Profiler (trend-prof.tigris.org), a new kind of profiling tool that looks for trends in the performance of a program; when tested on only small and medium sized inputs Trend Prof allows the programmer to find super-linearities in a program that might not only show up on a tool like gprof until the program is run on large inputs. That is, super-linearities show up to Trend Prof even when they are a small percentage of the whole running time. Presented by Simon at CodeCon 2009. Published as part of Simon's Ph.D. thesis at Cal Berkeley and separately as Measuring Empirical Computational Complexity.

Maximus: Have done a re-implementation of a second-generation version of Alex Aiken's MOSS program for string matching detection called Maximus; unpublished because UC will not allow it to be released as Open Source due to the Winnowing algorithm being patented. Like MOSS, Maximus uses Karp-Rabin and the Winnowing Algorithm (which I co-invented); however it extends the precision of MOSS by also implementing, as suggested by Joel Auslander, Ukkonen's Linear Time Suffix Tree algorithm.

Subversion tools: Faced with needing to modify the history of a Subversion repository, implemented svn_dump2dir and svn_dir2dump, two Perl scripts that explode Subversion dump files into directories and back. The svn dump file format quotes embedded files using counts, making them hard to edit or patch; whereas in directory form editing is straightforward. Implemented simple_patch Perl script that does less than what the standard patch tool does so it is easier to control. The only user feedback I've gotten is that the police of Queensland Australia seem to use it for something.

Digital Integrity

August 1998 - April 2001. Implementation work: C, Perl, Java (server-side), HTML. User interface work for three-tier web application talking to our proprietary search engine using both Java Servlets and Apache/Perl CGI. Implemented an alternative to Java Server Pages on top of Servlets but it wasn't used.

Theoretical work: Co-invented a new algorithm, Winnowing, for all-to-all string matching that for the parameters at which we used it reduced the size of the database by a factor of 8 and the query time by a factor of 3. (Implemented in maximus above.)

Collaborative Computing Systems

July 1997 - January 1998. Worked on three-tier HTML-Java-JDBC-SQL intranet/Internet applications. Worked on a simpler replacement for Java Server Pages that we used. Was on a team of about four who implemented the budget and employee management tool that Swiss Bank used worldwide in 1997.

NetJet Communications

April - June 1997. Java and JavaScript programming. Worked on client-side GUI programming and applet-browser interaction with both Netscape Navigator and Internet Explorer.

University of California, Berkeley

Graduate Student Instructor for four semesters:

Digital Equipment Corporation, Maryland

Sales engineer summer employee. Tracked a library of workstations used for demonstrations to potential customers by the sales force in the D.C. area. When asked to write a program to track the hardware the sales engineers brought in and out I instead designed and implemented a simple doubly-indexed library card system (yes, made out of paper); this replaced the cumbersome and unused paper forms and had none of the overhead and unreliability of software.

Anne Arundel Community College, Arnold, Maryland

Spring 1989. Computer lab and math lab assistant.

Hampshire College Summer Studies in Mathematics

(late '80s?) summer. Three weeks as a teaching assistant.

Backyard Boats, boat dealership in Annapolis, Maryland

Summer 1987. Scrubbing decks and hulls, painting teak oil on yacht trim.

Office of Daniel Coyle Wilkerson, Jr. M.D. (my father) in Arnold, Maryland

Late 1970s (as a child). Medical office work, filing patient records, autoclaving medical instruments, and assisting in the retrieval of unprepared dead bodies.

Education

University of California, Berkeley, five years graduate study in Theoretical Computer Science (from the Math department), 1991-1996.

Green Gulch Farm Zen Center, most of Spring practice period 1990 or 1991, plus a few additional weeks, so 8 - 10 weeks total. Full time student of Soto / Dogen Zen; we sat five 40-minute periods per day and then a five day sesshin of sitting all day. Have since sat several 1-day sesshins. About twelve years later returned for a seven day Rohatsu sesshin, which was almost disappointingly easy given my years of home practice.

University of Maryland at College Park, Bachelor of Science, Mathematics, 1990.

Budapest Semesters in Mathematics, Spring 1990.

Hampshire College Summer Studies in Mathematics, Summer 1986. Later also taught there by invitation.

Johns Hopkins Center for the Advancement of Academically Talented Youth, three-week summer programs: 1983 Intro to CS, 1984 Data Structures, 1985 Automata Theory and Formal Languages.

My brother and I taught ourselves to code in BASIC on an Atari 800. My mom says I was 10 and he was 9, but I remember being 12.

Publications

Note that these articles can be downloaded from my homepage, except the Math Magazine article.

I came to this problem through a friend who was taking Professor Richard M. Karp's CS 270 Combinatorial Algorithms and Data Structures class at The University of California, Berkeley. At the time I was just trying to help a friend with his homework (of course with the intent to provide proper notification to the course grader). After I solved it my friend told me that in fact neither Karp nor his teaching assistant had a solution.

Manhattan-Connected Exact-Coloring Is NP-Complete. Daniel S. Wilkerson.

I am proud of this paper as it is a simple solution to a practical problem. My contributions include that I conjectured power-laws as a low-dimensional way to model the number of executions of a line of code in a program.

Measuring Empirical Computational Complexity. SEC/FSE'07: The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Simon F. Goldsmith, Alex S. Aiken, Daniel S. Wilkerson.

This paper is theoretically cool, but more importantly it actually made a real string-matching index 8 times smaller and 3 times faster on the same queries. I am now only interested in this kind of practical stuff.

Winnowing: Local Algorithms for Document Fingerprinting. Saul Schleimer, Daniel S. Wilkerson, Alex Aiken. ACM Special Interest Group on Management of Data, 2003.

I worked on the theory part of this paper and the relevance of that part to reality is pretty questionable. The implementation side done by Alan and Brent is pretty cool, but I didn't work on that.

System Area Network Mapping. A. Mainwaring, B. Chun, S. Schleimer, D. Wilkerson. ACM Symposium on Parallel Algorithms and Architectures, 1997.

The papers below are useless theoretical junk, but you have to put something on your resume. I no longer do this kind of stuff.

Faster Computation on Directed Networks of Automata. R. Ostrovsky, D. Wilkerson. ACM Symposium on Principles of Distributed Computing, 1995.
Nearly Tight Bounds for Wormhole Routing. A. Ranade, S. Schleimer, D. Wilkerson. IEEE Symposium on Foundations of Computer Science, 1994.
Optimal Leapfrogging. J. Auslander, A. Benjamin, D. Wilkerson. The Mathematics Magazine, 1994.

Personal

Write the occasional non-fiction short story or essay (see my web page). Wrote a real Theory of Music to replace the current "medieval medical textbook" style pseudo-theory (now published, see above). Speak some conversational Hungarian. Like to sail, hike, and listen to live music.