From WikiChip
Difference between revisions of "schlemiel the painter's algorithm"

m (Generic example)
m
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
{{title|Schlemiel the Painter's Algorithm}}
 
'''Schlemiel the Painter's Algorithm''' (also spelled ''Shlemiel'') is a term referring to a class of [[routines]] that may seem to perform well under small workloads but prove to be highly inefficient as they scale due to needlessly redundant operations that are performed at a [[lower level]]. The term was coined by [[Wikipedia:Joel Spolsky|Joel Spolsky]] in late 2001.
 
'''Schlemiel the Painter's Algorithm''' (also spelled ''Shlemiel'') is a term referring to a class of [[routines]] that may seem to perform well under small workloads but prove to be highly inefficient as they scale due to needlessly redundant operations that are performed at a [[lower level]]. The term was coined by [[Wikipedia:Joel Spolsky|Joel Spolsky]] in late 2001.
  
Line 28: Line 29:
 
</source>
 
</source>
  
And so forth, which looks very clean and produces <code>"one, two, three, four"</code>. Unfortunately, for every {{C|string.h/strcat|strcat}} call, strcat has to start from the beginning and seek the end of the string all over. This operation becomes more and more costly as the string becomes longer, just as Schlemiel the Painter had to walk more and more to get back to his paint can. An operation that should only take \(O(N)\) is implemented above as \(O(N^2)\).  
+
And so forth, which looks very clean and produces <code>"one, two, three, four"</code>. Unfortunately, for every {{C|string.h/strcat|strcat}} call, strcat has to start from the beginning and seek the end of the string all over. This operation becomes more and more costly as the string becomes longer, just as Schlemiel the Painter had to walk more and more to get back to his paint can. An operation that should only take <math>O(N)</math> is implemented above as <math>O(N^2)</math>.  
  
 
=== Generic example ===
 
=== Generic example ===
Line 50: Line 51:
 
</pre>
 
</pre>
  
If the function was to be used on a simple file with 2 lines, function ''file_get_line'' will end up reading line 1 to get the first line and then it will read line one again and line two to get to the second line, resulting in 3 line scans. If the file was to have 10 lines, ''file_get_line'' will end up reading a total of 55 lines (\(\sum\limits_{i=1}^{10} i\)) over and over. It can be seen that the number of operations done grows very quickly even for moderately small files. A file with just 10,000 lines which could usually be read in just 10,000 seeks will result in \(\sum\limits_{i=1}^{10000} i = {50,005,000}\) seeks with this implementation.
+
If the function was to be used on a simple file with 2 lines, function ''file_get_line'' will end up reading line 1 to get the first line and then it will read line one again and line two to get to the second line, resulting in 3 line scans. If the file was to have 10 lines, ''file_get_line'' will end up reading a total of 55 lines (<math>\sum\limits_{i=1}^{10} i</math>) over and over. It can be seen that the number of operations done grows very quickly even for moderately small files. A file with just 10,000 lines which could usually be read in just 10,000 seeks will result in <math>\sum\limits_{i=1}^{10000} i = {50,005,000}</math> seeks with this implementation.
 +
 
 +
== External links ==
 +
* [http://www.joelonsoftware.com/articles/fog0000000319.html Joel Spolsky's Blog Post]
  
 
[[category:optimization]]
 
[[category:optimization]]
 
[[category:algorithms]]
 
[[category:algorithms]]

Latest revision as of 00:56, 21 February 2016

Schlemiel the Painter's Algorithm (also spelled Shlemiel) is a term referring to a class of routines that may seem to perform well under small workloads but prove to be highly inefficient as they scale due to needlessly redundant operations that are performed at a lower level. The term was coined by Joel Spolsky in late 2001.

Background[edit]

The term is based on the Yiddish joke involving Schlemiel (He שלעמיל) which means an inept clumsy person. Spolsky explained it in his blog as:

Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.

The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.

The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on?"

"I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"

Example[edit]

C[edit]

Spolsky used the c's strcat() function to illustrate his point. strcat() concatenates a second string onto the first one by traversing the first string by checking each character and locating the null character that terminates the string. The second string is then copied over to the end of the first string, concatenating them. The old, and new, lengths of the string are discarded once the operation is done. For example:

char string[1000];
strcpy(string, "one");
strcat(string, ", two");
strcat(string, ", three");
strcat(string, ", four");
...

And so forth, which looks very clean and produces "one, two, three, four". Unfortunately, for every strcat call, strcat has to start from the beginning and seek the end of the string all over. This operation becomes more and more costly as the string becomes longer, just as Schlemiel the Painter had to walk more and more to get back to his paint can. An operation that should only take Equation upper O left-parenthesis upper N right-parenthesis is implemented above as Equation upper O left-parenthesis upper N squared right-parenthesis .

Generic example[edit]

An alternative example involves a random file access library module that provides a hypothetical function:

string file_get_line(string filename, uint line);

The hypothetical function opens file filename, seeks line line, closes the file and returns the appropriate line. A programmer might be tempted to use that function to read the entire file into an array of lines as:

string[] file_toarray(string file)
{
    string[] lines;
    uint line, size = file_line_count(file);

    for (line = 0; line < size; ++line)
        lines.append(file_get_line(file, line));

    return lines; 
}

If the function was to be used on a simple file with 2 lines, function file_get_line will end up reading line 1 to get the first line and then it will read line one again and line two to get to the second line, resulting in 3 line scans. If the file was to have 10 lines, file_get_line will end up reading a total of 55 lines ( Equation sigma-summation Underscript i equals 1 Overscript 10 Endscripts i ) over and over. It can be seen that the number of operations done grows very quickly even for moderately small files. A file with just 10,000 lines which could usually be read in just 10,000 seeks will result in Equation sigma-summation Underscript i equals 1 Overscript 10000 Endscripts i equals 50 comma 005 comma 000 seeks with this implementation.

External links[edit]