|
Bob Jenkins, Software Developer
bob_jenkins@burtleburtle.net
https://burtleburtle.net/bob
I like algorithms, physics, math, and
combinatorics. I like experimentation, but I also like seeing my work
widely used. I want to design and write and maintain my code myself.
C, Java, C++, C# are fine languages. I want to give users what they
want.
I don't want to be a manager. I don't want to monitor other
people. I like working with other developers and being helpful.
|
- Work Experience
-
IBM (1988) I spent an undergraduate summer as an intern at IBM
(Owego, New York) working for Dick Steflik. I debugged a large
educational management system written in REXX. The existing code
tended to shrink when I edited it. I wrote a total of minus 5000
lines of code that summer. I also developed a new tool to
automatically lay out text hierarchy charts that looked like ones
built by hand.
Oracle (1989-2006) I joined Oracle in October 1989 and worked
there for 17 years, except for nine months sabbatical while I pursued
(did not complete) a doctorate at Berkeley (August 1992 - June 1993).
Oracle's code (C and PL/SQL) is very good. It usually didn't shrink
when I edited it.
- Arithmetic (1989-1990) (select 1+2 from dual) I
made Oracle addition faster, multiplication 4x faster, division 6x
faster, square root 66x faster, and added sin, cos, tan, sinh, cosh.
Nobody after me made much headway until Edmund Waugh, about 1998,
doubled the speed of most routines again. (He replaced my
if-statements with table lookups.) I later did
a floating
point big number library at home designed along the same lines.
- Snapshots (1990-1995) (create snapshot s as select *
from t@link): I did the first implementation of snapshots
(now called materialized views), updatable snapshots, and the job
queue. Replication took over snapshots and the job queue in 1995 when
I returned to constraints full-time.
- Constraints (1990-2006) (create table t (t1 primary
key, t2 references t)) I inherited constraints after the basics
had been finished. I implemented delete cascade, deferred
constraints, unique constraints enforced by nonunique indexes,
constraint/trigger interactions (mutating errors), and
enable-novalidate constraints (which allow constraints to be enabled
without any long term locks). A vital component was a skip list
with rollback integrated with Oracle transactions.
- Character semantics (2000-2006) (varchar2(10
char)) I was in charge of writing the spec for Unicode support
for Oracle 9i (the meetings were many and vigorous) and implementing
it in the RDBMS (three others helped me straighten out the system
quirks that blocked it). I also fielded most issues with the CHAR
datatype, which uses blankpadded semantics. For example, what do you
pad to when there is both a byte limit and character limit?
- Function-based indexes (1997-2004) (create index ti on
t (t1+1)) I did the functional index design and interface. Three
of us implemented it, combining it with parallel projects for virtual
columns and drop column. I also did descending indexes, including
descend and undescend functions
(sys_op_undescend(sys_op_descend(x)) = x), since the index
layer could only support binary comparison.
- Parser (1998-2006) (select p from from q) I owned
Oracle's 25 year old 100,000 line recursive-descent SQL parser. This
meant I reviewed, but did not write, one to three changes to it a day.
I slowly straightening out the parser's quirks. For example,
I added tokenizer support for UCS2, and I would have removed the
requirements that SQL strings be mutable and null-terminated if I
hadn't left in 2006 before it was completed.
- General cursor compilation (1989-2006) (select *
from (select * from dual)) I've dealt with most of cursor
compilation, as well as row sources and the index layer. I was a
catch-all for problems that didn't fit cleanly in any area, or belonged
to code that nobody knew anymore. For example I exposed a
fixed table of all built-in SQL operators and fixed two dozen crashes
that happened when you fed them simple (but inappropriate) arguments.
I taught the RDBMS architecture course twice.
- Hashing (1997-2006): I was Oracle's resident expert on
hash functions. Hashing is not a one-size-fits-all problem, so I was
regularly consulted on what hash to use where, how many collisions are
expected with so many buckets, and how many bytes of hash are needed
to make collisions "never" happen. I also sought out issues with
random numbers, algorithms, combinatorics, and math.
Microsoft (2006-present) I am working in the Cosmos team at
Microsoft. Large streams of data are split into 250MB compressed
append-only extents. Extents are stored in triplicate, and can be
read by user processes that run on the same machines. Mirrors copy
streams from one cluster to another.
- Cosmos Store (2006 - 2007) I worked under Andrew Kadatch on
the Cosmos Store, written by Andrew in C++. I modeled the system,
documented its rules, enhanced the logging, wrote pretty-printers for
the binary files, and fixed bugs. I worked on the lengths of extents,
monitoring disk space, and made CRC computation 6x faster.
- Cosmos PN (2007) The PN (Process Node) is in charge of
managing the user processes running on a given machine. I owned the PN
about a month. I revise password management so we could handle
passwords aging out without any deployments or any downtime.
- Scope (2007 - 2010) The Scope language is
roughly SQL clauses and C# expressions, and it maps to execution graphs
where the nodes run on the Cosmos Store. I implemented SELECT, aggregates,
expressions, and parsing. I replaced GREP-style parsing with a top-down
parser, which I later unwillingly replaced with YACC. I designed and implemented
constant folding, expression normalization, and compiletime variables.
- Cosmos Store (2010 - present) Several things.
- An in-memory B-tree to quickly determine which extent holds the nth byte of a stream
- Fetching, in one call, the locations of extents covering many byte ranges in a large stream
- Improved reporting of data loss; general ownership of store reliability
- Manual and automated measures of data coldness, availability and reliability
- Configuring mirrors through a source-controlled XML file that describes all mirrors (rolled back)
- Refactoring the distributed concurrent mirrors tests to use that config (kept the refactoring)
- Determining extent placement through a tree of hash tables, cached on all clients (rolled back)
- Removed so many bottlenecks from lazy replication. Around 20x more throughput.
- Parallelized garbage collection, allowing metadata rings to hold 4x more data.
- All main data objects in smoothly growable hash tables based on virtual memory arrays
- Faster and smaller storage of updatable lists of integers (a skip
list of short array)
- Unlimited cluster scaling (only implemented for temp data)
- Fairer and lower latency job scheduling (not implemented)
- 60x faster CRC concatenation
- Erasure coding: 4/3 or 7/3 instances per extent. Less space and more reliable than 3-instance.
- Education
-
Undergraduate (plus masters) (1985-1989) Carnegie Mellon University.
I graduated with a BS in Computer Science and a masters in math, QPA
3.7. My master's thesis was a new dynamic
algorithm for calculating the HOMFLY polynomial for directed knots
and links. I drew cartoons for
both campus papers.
Graduate (1992-1993) I spent a year at Berkeley in the
doctoral program for theoretical computer science, but returned to
industry after deciding I didn't want to be a professor.
- Other Projects
-
I develop software tools and primitives at home. I put it all on
burtleburtle.net, public
domain.
I'm interested in cryptographic primitives. My pseudorandom
number generator, ISAAC, was
presented at the 1996 Fast Software Encryption Conference. It hasn't
been broken, it hasn't seen much analysis, it hasn't been widely used.
I developed some test
suites to test
ISAAC, and have also used them to find biases in RC4 and many other CPRNGs
proposed on sci.crypt. I can break 5-bit RC4. I
submitted Maraca
to the SHA-3 competition in 2008, but it was rejected, and later
thoroughly broken (arbitrary collisions can be calculated with 2
seconds of computing time). In 2012
I wrote SpookyHash,
a fast 128-bit noncryptographic hash.
Hashes for hash table lookup are similar to cryptographic
block ciphers. I developed some theory for what
makes a good hash function for hash table lookup. I wrote a hash sieve
to generate random hashes and see if they are good. It could screen
ten a second. I have released several good hashes.
They have been used far and wide.
The surprising efficiency of my hash sieve led me to develop jenny, a
pairwise testing tool.
I wrote an orbital applet
in Java, later rewritten to Javascript, to simulate a set of
orbits I'd designed that should form a Dyson Sphere,
or maybe even a
dense computing core. I have online simulations of the solar
system,
Cruithne,
and Klemperer
Rosettes. I had to derive a 9th degree
symmetric multistep method because the
existing methods I could find (leapfrog, Runge Kutta, Runge Kutta
Fehlberg) weren't accurate enough for my purposes. I later found my
multistep method was reinventing the wheel.
I have a collection of about 50 boy scout skits
that I've seen, which is fairly popular. I've written short stories.
Some years I write music.
I don't watch TV, or play sports, or participate in any organized
activities. I do follow science, the stock market, read science
fiction, and practice musical instruments. Most of my time outside
of work these days goes to raising kids.
References are available upon request. Contact me via email at
bob_jenkins@burtleburtle.net.
|