From WikiChip
Schlemiel the Painter's Algorithm
Revision as of 23:50, 20 February 2016 by ChipIt (talk | contribs) (typo)

Schlemiel the Painter's Algorithm (also spelled Shlemiel) is a term referring to a class of routines that may seem to perform well under small workloads but prove to be highly inefficient as they scale due to needlessly redundant operations that are performed at a lower level. The term was coined by Joel Spolsky in late 2001.

Background

The term is based on the Yiddish joke involving Schlemiel (He שלעמיל) which means an inept clumsy person. Spolsky explained it in his blog as:

Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.

The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.

The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on?"

"I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"

Example

C

Spolsky used the c's strcat() function to illustrate his point. strcat() concatenates a second string onto the first one by traversing the first string by checking each character and locating the null character that terminates the string. The second string is then copied over to the end of the first string, concatenating them. The old, and new, lengths of the string are discarded once the operation is done. For example:

char string[1000];
strcpy(string, "one");
strcat(string, ", two");
strcat(string, ", three");
strcat(string, ", four");
...

And so forth, which looks very clean and produces "one, two, three, four". Unfortunately, for every strcat call, strcat has to start from the beginning and seek the end of the string all over. This operation becomes more and more costly as the string becomes longer, just as Schlemiel the Painter had to walk more and more to get back to his paint can. An operation that should only take Equation upper O left-parenthesis upper N right-parenthesis is implemented above as Equation upper O left-parenthesis upper N squared right-parenthesis .

Generic example

An alternative example involves a random file access library module that provides a hypothetical function:

string file_get_line(string filename, uint line);

The hypothetical function opens file filename, seeks line line, closes the file and returns the appropriate line. A programmer might be tempted to use that function to read the entire file into an array of lines as:

string[] file_toarray(string file)
{
    string[] lines;
    uint line, size = file_line_count(file);

    for (line = 0; line < size; ++line)
        lines.append(file_get_line(file, line));

    return lines; 
}

If the function was to be used on a simple file with 2 lines, function file_get_line will end up reading line 1 to get the first line and then it will read line one again and line two to get to the second line, resulting in 3 line scans. If the file was to have 10 lines, file_get_line will end up reading a total of 55 lines ( Equation sigma-summation Underscript i equals 1 Overscript 10 Endscripts i ) over and over. It can be seen that the number of operations done grows very quickly even for moderately small files. A file with just 10,000 lines which could usually be read in just 10,000 seeks will result in Equation sigma-summation Underscript i equals 1 Overscript 10000 Endscripts i equals 50 comma 005 comma 000 seeks with this implementation.