This problem was solved in 2009over 90 years ago. (See here for xkcd author Randall's explanation).
Of course, this answer is late, so it will never get upvoted :)
Update: There are now two user scripts that augment answer lists with Wilson scores: Sort Best First and Wilson confidence rating calculator. You are welcome to try them in order to assess how much of an improvement it would be to switch to sorting by Wilson score.
[Edit] Layman summary: we want to determine what the upvote-percentage would be if everyone voted. But since we only have a small sampling of votes, we use fancy statistics to determine a range of percentages we can be fairly certain the real percentage falls under. We take the lower end of that range, to err on the side of caution.
Here is the output of the equation. You'll notice that when there are many votes, the output is close-ish to positiveVotes/totalVotes, but when there are few votes it's much smaller.
This is exactly what we want.
Here is some code:
///<summary>
///Returns a rating for the given post. Larger is better.
///Based on the equation found at http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
///</summary>
public double GetPostRating(int numPositiveVotes, int numNegativeVotes)
{
int totalVotes = numPositiveVotes + numNegativeVotes;
if(totalVotes == 0)
return 0;
const double z = 1.96; //Constant used for 95% confidence interval in a p-distribution
double positiveRatio = ((double)numPositiveVotes)/totalVotes;
//Crazy equation to find the "Lower bound of Wilson score confidence interval for a Bernoulli parameter"
//Again, see the above webpage
return (positiveRatio + z*z/(2*totalVotes) - z * Math.sqrt((positiveRatio*(1-positiveRatio)+z*z/(4*totalVotes))/totalVotes))/(1+z*z/totalVotes);
}
Note that the above equation assumes upvotes and downvotes have the same frequency. Since upvotes are way more common, downvotes should ideally be weighted more harshly (in other words, three downvotes says a lot more about an answer than three upvotes).
Also, I believe newer answers should be given preferential treatment, at least for a few minutes (see my comment below).
But even without these, this is a neat improvement.