Do Your Dice Roll True?
The founder of GameScience, Lou Zocchi, has long claimed that GameScience dice roll more true than other gaming dice. In a well-known GenCon video Zocchi explained why GameScience dice should roll more true.
His logic is that due to how dice are made, traditional RPG dice are actually put through a process similar to a rock tumbler as part of the painting and polishing, and this process causes the dice to have rounded edges. In theory the uneven rounding gives the dice an inconsistent shape that favors certain sides. GameScience dice are not put through this process, which is why they retain their sharp edges and is also why their dice come uninked.
While Zocchi’s makes a good argument about egg-shaped d20s, what was lacking was any kind of actual testing of how the dice roll. Nowhere were we able to find any tests of d20s — either GameScience or traditional d20s — to determine whether or not they roll true. As giant fans of dice and an impartial third party, we decided to run a test ourselves and see just how randomly RPG d20s really roll.
We pitted GameScience precision dice against Chessex dice (the largest RPG dice manufacturer) to see what science has to say.
Methodology
For the principle test we used one Chessex d20 and one GameScience d20, both brand new right out of the packaging. The GameScience d20 was inked with a Sharpe to make it easier to read the results, but the dice were not modified in any other way.
The dice were rolled by hand on a battlemat on a level table. For this experiment the dice were rolled on the surface for at least two feet and had to bounce off a flat backstop before coming to rest. This is similar to the requirements of craps tables in casinos. Our logic is that if this method successfully prevents cheating with six-sided dice, it will more than suffice for d20 dice being rolled without any intent to alter the results. (Since casinos are not losing money on gambling, we assume they know what they’re doing).
Each die was rolled 10,000 times, and the results recorded.
Test Results
After an insane amount of dice rolling, here is a quick look at the results for each die:
A casual analysis of the results suggests that neither die is rolling randomly.
If we had a d20 that rolled perfectly, each face would come up 500 times. But of course randomness isn’t perfect and we’d expect some deviation: over the course of 10,000 rolls we’d expect, with 85% confidence, that each face would be within about 33 of 500 — so anywhere from 467 to 533 is within the bounds of randomness. (At 95% confidence the margin of error is 45). Neither die falls within these bounds.
The Chessex d20 had a standard deviation of 78.04, and the GameScience d20 had a standard deviation of 60.89.
While neither die rolled true, it’s certain that the Chessex die rolled less true, with a greater degree of deviation from the expected range across more of the dice faces. Interestingly, the GameScience die actually rolled very close to true except for the number 14 which rolled vastly less often than it should have, farther off than any face of the Chessex d20. Applying the results to a Chi Squared test also confirms that neither die is rolling randomly (even if you ignore the 14/7 on the GameScience die).
GameScience 14 Theory:
We have a theory as to why the 14 rolled so infrequently on the GameScience d20. Every GameScience die has a small chunk of plastic that sticks out of one face. This flashing is from where the die was removed from the mold. It occurs on all dice, but in Chessex dice this flashing is removed in the polishing process.
On GameScience 20-sided dice this flashing is on the 7 face — directly opposite the 14.
It seems likely that it is more difficult for the d20 to land on the face with the flashing sticking out, pushing the GameScience die off that face. In other words, this flashing makes the 14 roll far less often than it should. Since the flashing position is set from the mold, all GameScience d20s should have the flash in the same position (and all in our inventory do).
Some Confirmation
Since this test was simply one d20 from both manufacturers, it’s possible we just happened to choose the only Chessex d20 that didn’t roll true, and the only GameScience d20 that rolled far fewer 14s. As a check on our results we took another new d20 from both Chessex and GameScience and rolled each under the same conditions.
After 1,600 rolls the same pattern emerged (incidentally, the standard deviation after 1,600 rolls was almost identical to the 10,000 roll test). The Chessex d20 still had more deviation from expected than GameScience, and the GameScience d20 rolled massively fewer 14 results. Both dice still rolled sufficiently out of true to be beyond the margin of error. So this quick (well, not so quick) double check is some confirmation of the 10,000 roll test.
So Which Dice Are Better?
It’s worth stressing that based on our tests you would need a lot of dice rolls before you saw a meaningful difference in any of these gaming dice — roll a thousand times and maybe you’ll see 5 or 10 less of a given number than you’d expect (or more). So for gaming purposes both dice will work just fine. Seriously.
But that said Chessex dice (and in theory any rounded-edged dice) are going to roll less close to true. Because of the randomness of the process that changes the shape of the dice, there’s no way to predict which faces are going to roll better or worse. Indeed this means that you could have dice that are “lucky” and roll high more often or crit more often, and “cursed” dice that seldom roll 20s and fumble more often.
With GameScience dice, on the other hand, you know that the 14 will roll substantially less than any other result — so technically the dice will roll low, but the 20 should roll just about as often as the one, or the 10. If you carefully cut off the bump on the GameScience dice with a sharp box cutter or exacto knife you should get a result that is very close to being truly random.
Raw Data
Here is all of the data from the 10,000 roll test, so anyone who wants can subject the numbers to their own statistical analysis. We’re including in here the percentage that the rolls of any given number deviate from the expected number of 500 per face.
Chessex d20 |
||
Number | Qty Rolled | Deviation from Expected |
1 | 395 | 21.00% |
2 | 417 | 16.60% |
3 | 576 | 13.19% |
4 | 567 | 11.82% |
5 | 488 | 2.40% |
6 | 622 | 19.61% |
7 | 396 | 20.80% |
8 | 443 | 11.40% |
9 | 542 | 7.75% |
10 | 581 | 13.94% |
11 | 544 | 8.09% |
12 | 554 | 9.75% |
13 | 399 | 20.20% |
14 | 411 | 17.80% |
15 | 562 | 11.03% |
16 | 593 | 15.68% |
17 | 561 | 10.87% |
18 | 558 | 10.39% |
19 | 383 | 23.40% |
20 | 408 | 18.40% |
GameScience d20 |
||
Number | Qty Rolled | Deviation from Expected |
1 | 508 | 1.57% |
2 | 564 | 11.35% |
3 | 496 | 0.80% |
4 | 532 | 6.02% |
5 | 488 | 2.40% |
6 | 492 | 1.60% |
7 | 503 | 0.60% |
8 | 580 | 13.79% |
9 | 474 | 5.20% |
10 | 555 | 9.91% |
11 | 533 | 6.19 |
12 | 486 | 2.80% |
13 | 463 | 7.40% |
14 | 295 | 41.00% |
15 | 491 | 1.80% |
16 | 499 | 0.20% |
17 | 443 | 11.40% |
18 | 602 | 16.94% |
19 | 522 | 4.21% |
20 | 474 | 5.20% |
This is Just One Test
In the world of science, this is just one very small test. To have relatively certain results we’d need to replicate this test across many different Chessex and GameScience dice — if anyone is interested in running their own test to corroborate or contradict our results, we would love to hear about it!
Once our wrists recover from all the rolling, we may consider a second test ourselves — specifically to confirm the theory that the flash on the GameScience die is what is causing the 14 to roll so low: we want to carefully sand the flash down and retest the same die to see if it then rolls more true.
Disclaimer: we have made every effort to ensure that our testing methodology was as fair and accurate as possible; however, without much more testing we cannot say with certainty whether one kind of dice roll better or worse.
Great write up. You are right that any meaningful difference would be difficult to detect while playing. Think of how many games you’d have to play to roll a d20 1,600 times. It would take years!
From a practical standpoint, I don’t think it matters all that much. But like you, I want a die that “rolls true.” If GameScience could figure out a way to eliminate the flashing issue, then it seems clear they would have a more accurate product.
As you mentioned, it’s only one test. Having done a previous comparison of two die with only 200 rolls, you have my utmost respect and sympathy!
Thank you.
I wouldn’t mind this slight deviation on the precision dice if they made a very slight change to the mold.
Put that flashing on the 20.
That’s pretty harsh.
Makes it harder to roll a 1, seems fine. :p
Glad to see that you were at least a little critical even with your preferred dice the game science.
Your work points a problem that I already have detected in my “precision dices”. The flash that all of them have create an unbalance that favors other results.
It’s not so important in game, but when you buy dices that claim “true rolls” you want “true rolls”.
Congratulations for your huge job.
Exactly right. I want my dice to be as random as possible.
I wonder how difficult it would be to file down the flash on the 7 face on the GameScience die to make it roll more true.
Its not difficult at all actually, I bought a pack of GameScience at GenCon this year and as soon as I got them home I filed the flash off very gingerly with some of my diamond files from miniatures prep. You can barely tell it was there now, and the die rolls great.
oh no – now you made it worse :)
It’s actually extremely easy with an X-acto knife. You make the flat of the blade rest against the side of the die as flat as possible, and cut the blemish off gently. Repeat the process on the other side of the die touching the blemish. Repeat on both sides gently, until the angle is smooth and follows the original shape of the die. Takes a minute to do it properly.
Very nice! I’ve used GameScience dice personally for a while now; it’s nice to see confirmation for my preference, though it looks like I’m going to have to work on the flashing on all my dice…
Here’s my question: does GameScience offer a d30?
lookee: http://dungeonsndigressions.blogspot.com/2009/08/odds-ends-gamescience-d30-d-on-dvd-r.html
I’m not sure where you’re getting the “standard deviation” from – what mean does it refer to? For a test like this I think the correct thing is a chi-square comparing the observed to expected rolls.
By that standard I get hugely significant deviation from expected in your data – the probability that you would get these rolls with a fair die is 2.19 x 10^-44 for the Chessex die and 2.02 x 10^-22 for the Gamescience, many orders of magnitude more unlikely than the “five sigma” criterion in physics required to declare phenomena as nonrandom. I don’t think any further tests are necessary, except of multiple dice to see whether the deviations are random by die or systematic across the production process.
Doesn’t the chi squared test require a standard distribution to be effective? A d20 has a proportional distribution.
However, my analysis also agrees that the results are not random; however, the GameScience die gets very close to random (but not quite) once you eliminate the 14 result.
Proportional distribution? A “fair” die should have a uniform distribution.
An expected count is what you need for your chi-squared, and in this case it would be expecting each number to show up 5 times.
Using this data to show that it is statistically significant that the Chessex d20 is not rolling each number evenly is an interesting finding!
Yes, sorry, the d20 has a uniform distribution.
The GameScience also rolls outside of random – though aside from the 14 it’s closer than the Chessex. But in order to tell the difference for either of them you need over 1,000 rolls.
Central Limit Theorem…….you do not need nearly 1000 rolls to see a difference, usually 30+ is enough.
I can confirm these p-values as well. I have a p–value of 2.1881836708123E-044 for the Chessex die and 2.01956872831409E-022 for the GameScience die.
One way to look at this is they’re both obviously NOT uniformly distributed. Another way is that the GameScience die is 9 sextillion times as likely as the Chessex die to be fair. (I kid! that’s actually a horrible way of looking at it.)
Most serious gamers have simple tools that can clean up your Gamescience dice. I do this to all mine. If you have cutters like the Games Worlshop ones to cut plastic models, then you are halfway there. A small file and you ate done. Snip and file Easy to do any just makes great dice better.
Like Daniel says, I fix the flash with my clippers and craft knife and get a fine result.
Otherwise, good article. Thanks for the effort.
Am I the only one who feels weird about the idea behind buying dice specifically to be fair which I then need to file, sand, and ink myself to make closer to fair?
I’d expected the GS fans to be upset by this piece as I read it, but everyone seems pleasantly chill with the idea of custom-modifying their RNG to make them more random.
It has always been my understanding that you’re *supposed* to file down your game science dice, and I always have.
Game Science Dice are sold with some assembly required. They’re not ready to use out of the box. It’s not as though this article is exposing a flaw in the game science design. It’s just a different preference. Some people like to buy hamburger patties pre-packaged, and other people like to buy raw hamburger and make the patties themselves because they think it tastes better.
I suppose that’s why I’m feeling good about these results. Even without the modifications which I’ve always viewed as expected, Game Science is significantly more random. It’s not perfectly random, mind you, but that’s never been a secret either. Lou Zocchi even says in the video (if I recall) that his dice are made to about 1% of the standards for casino dice.
Brilliant study! A large data set was missing from all of our musings and it seems that there is some merit to the claims made by Gamescience – but that 14 issue is rather disconcerting. For the sake of completeness I’ll mention I looked at dice shape, but I didn’t do any roll data so it was only theoretical.
Maybe I am missing something, but there are 20 possible outcomes. So about one of them can be expected to be at a frequency outside the 95% confidence interval.
That’s not exactly how it works. A 95% confidence interval just means that if you were to repeat the test 100 times, 95 times the results would fall within the margin of error of the test results.
The results we’re seeing in this test (especially with some confirmation with the second 1,600 roll test) are clearly outside the bounds of randomness.
No, I’m pretty sure your use of the standard error is flawed. It would be easier to tell if you posted your exact method.
Only if you do your calculations incorrectly. You should have a 95% confidence interval for 20 results, which is larger than it would be for any one result.
Dear Tester. I think that the Game Science dice said that you should clip the nub of plastic off. He even sells dice with no paint in the numbers as these should be even “truer.”
Maybe it was ETA later, but the article specifically mentions that they inked their test die to make it readable.
Nice work!
If you compare this experiment with calling the RANDBETWEEN() function in Excel 10,000 times, the Chessex dataset has a chi-square value of 260, the GameScience has a chi-square of 150, and RandBetween will typically give a chi-square in the low twenties. (It recalculates every time you change something in the spreadsheet, so you’ll see a range of values.)
What values are adjacent to the 14 and the 7? If you sort the counts by how far outside the expected value they are and start throwing the worst ones, you only have to throw out the three or four worst faces on the GameScience die (in order: 14,18,8,2) to drop down into the same range of chi-square values that RandBetween displays. Whereas the chi-square test shows the Chessex die being out of true now matter how many “bad faces” you discard. So I think that could show support for your hypothesis about the flashing.
Be wary of using Excel to pattern randomness. Most programs are only capable of achieving pseudorandomness due to the limited design of the Rand() function of the .net framework. Nowadays we use cryptography to achieve better randomness but I don’t think Excel utilizes those methodologies just yet.
In the first set of Game Science dice I ordered, the d20 not only had a flash issue, but one whole corner had been sheared off in the process of removing it from the mold. To their credit, Game Science promptly replaced the d20 when I sent them a photo of it, but I was disappointed. Given the many environmental variables that affect d20 usage in RPGs, I’m not convinced that it’s significantly better to have “precision” dice.
Thanks for taking the time to roll, tally, and report. That’s a crap-ton of die-rolling!
Thanks for the great analysis.
I think having clean, fair dice is actually a lot more important to a game than people give it credit for, and worth the time and effort spent, at least for games you play a lot.
I play ASL, which has a lot of 2d6 rolls. I spent the money to upgrade to some high-quality precision dice, which honestly I didn’t expect to have a major impact. But after just a few games (probably a couple hundred rolls) I could notice a difference in the texture of the game. It’s minor, but noticeable, and makes for a better game.
It’s also a better for games like Settlers or d20 RPGs if players have a lot of confidence in the fairness of the dice. The Chessex d20s are notoriously suspect, as are the standard lightweight wooden d6s included in a lot of euros.
I spent the money to upgrade to some high-quality precision dice, which honestly I didn’t expect to have a major impact. But after just a few games (probably a couple hundred rolls) I could notice a difference in the texture of the game.
http://lmgtfy.com/?q=confirmation+bias
Oh sheesh who cares? The difference is sooooooo minuscule, and your dice are still difficult to read. Did you test them after rubbing them with a crayon so you can read the numbers? I’ll take minor inaccuracies over convenience any day.
Anecdotal, non-scientific data point: the new gamescience dice, of which I deem to be anything made post 2000, and possibly earlier, are not as good as the old ones made in the 80s. Almost every GS d20 I’ve picked up since 2000 have had one or more convex faces, with the opaque dice having a much higher incident. It is obvious in use, as many of the newer GS d20s spin around on their convex faces when coming to rest. That never happens with my old 80s vintage models.
That is quite the intense test and I do have to say that I’m impressed with the results, but I would like to point out one thing in the test results. The GameScience video that was referenced already discusses the flashing that caused the decreased number of 14 rolls and instructs that you should shave or sand down the bump. Granted I would expect that do be done prior to purchase but it is intended to be done. I think it would be cool to do this test again with the flash removed and see if the theoretical “near perfect random” result would appear.
In the video he also pretty strongly implies that the blemish doesn’t affect the randomness of the roll, which our tests seem to disprove.
I also would like to repeat the test with a sanded down GameScience die — right now even removing the 14 from the results the GameScience die doesn’t roll true, but it’s very possible that’s due to the extra rolls other adjacent numbers are getting from the 14 not rolling.
One thing is that when Lou was in control of molding the dice he made sure that there wasn’t any thing sticking out. That the sprue was cut off from the die. The blemish he is talking about is a small divot that his dice used to have. Now the company that is using his molds and company name don’t cut off that sprue leaving the customer to clean up the die.
BTW most all Gamescience dice can be purchase pre-inked.
I may be wrong, but it appears that the chessex die, while showing wider range of deviation, was still very close to 50% at 10 or less. And you may also notice that on the die documented the ‘low’ results were 1,2,7,8,13,14,19,20 – which can be paired into numbers on opposite sides of the die (adding to 21). This indicates the die may have been a bit misshapen.
So with another die you may have one that rolls fewer 3s, 4s, 17s, and 18s. but better chance of other numbers, It also seems that (if this holds true) that for every high number you are less likely to roll, it is equally unlikely to roll a matching low number. Making the Chessex die, although bearing more deviation, the more ‘fair’ dice on average.
I think having to sand down the flashing ourselves is a decent compromise. I can’t imagine that the cost of the dice wouldn’t skyrocket if consumers demanded the maker sand it himself. I say it just comes with the territory.
The Science Dice flash problem is a big deal to me. When I buy precision backgammon dice, I don’t have to file down any flash. Science Dice has been harping on their trueness for years, with no reference to “if you file down the flash”. The dice don’t roll true, they don’t look good, and they aren’t usable out of the box. I am not surprised they haven’t caught on.
Plus, if you don’t file the flash extremely precise, you may be making the die ‘not true’. I think doing the sanding to maintain true-ness will be tougher than the average gamer realizes, particularly if they are doing a full D&D set.
If you’re checking for manufacturing quality, you need to use more than one die from each manufacturer…
We did double check our results with a second die from each, as the post says, but you’re absolutely right. Like we said, we really need others to repeat this experiment for conclusive results!
By “more than one die,” I meant “a large sample,” not just one more.
Fifty or a hundred, out of multiple production runs.
I expected to read that someone rigged up a dice rolling rig with optical character recognition to do the dice rolling and data collection. . . . it certainly would save on effort, and allow very large data sets with minimal human effort.
I can’t say that I’d want to roll dice 10,000 times BY HAND.
Back in the AD&D days, when some d20 rolls needed to be high and others low, players quickly learned to choose dice that “liked” one end or the other. Might have been dice superstition, but gamers tend to be a superstitious lot when it comes to our dice.
When large numbers of dice became available back in the ’80’s with Koplow’s boxes and later with Chessex’s Pound o’ Dice, players would select dice more or less randomly from a group of like-sided dice for each rolling situation, increasing randomness in the long run.
The new gamescience precision dice do not have a chunk of plastic anymore, because the new mold is filled thru the middle of the O. Then there are gamescience precision dice availible with pricision razor cutted edges. So that are the best rollplay dice ever.
They still have a chunk of plastic — we used brand new GameScience dice straight from the distributor. In fact, if we look at the stock of GameScience dice on our shelf, we still see the flashing on all of them.
I think the new mold has reduced the size of the flash from what it used to be, but it’s still there.
Select bad made dice specially for high and low less randomly rolling, is CHEATING now!
So we decided to use only the same Dice for all. Yet we have bought gamescience precision dice they are great.
Higher Shannon entropy would mean a truer die. (Shannon entropy can be thought of as the number of bits necessary to encode a distribution. More uniform distributions have higher entropy.) It’s true that the Gamescience die has higher entropy than the Chessex die, though note that it’s a difference of about .005 of a bit (2.987599 bits for Gamescience vs. 2.982468 bits for Chessex).
On 21SEP12, someone named Volker reported that the plastic used to make Gamescience dice is injected into the mold through the 0. This is not true. To the best of my knowledge the plastic is injected into the mold the same way it has always been injected. When recording my 20 minute talk about why all dice are not created equal, I pointed out that the Armory made a new mold for its 30 sided shape, which was supposed to eliminate the clip mark. Their master plan was to inject the molten plastic through the 0. However, hot plastic takes up less space than cold plastic, so when the plastic already in the tool, began to cool, is also shrunk. Because the gate through which more plastic can be added, was closed to create the 0, the units they produced had a very pronounced “HEAT SINK” deformity. Because the 0 face and the several faces adjacent to the 0, had pronounced heat sink, the die was shorter on this axis. I think Mr. Volker is remembering what I said about the way the Armory tried to eliminate the D-30 clip mark problem. Funny thing, I was the first in the U.S. to manufacture polyhedral dice in the U.S. and the only one doing so for 6 years. During all of those years, no one complained about the clip mark until after I began making Diamond, Ruby, Sapphire and all of the other Gem colors. We are aware that the clip mark is a minor blemish and we have been looking for ways to eliminate it, without distorting the basic shape of the die. Tumbling the die 3 different times in a rock polisher, takes more material off of some edges and sides than others. During my video, I point out several examples where the 20 has been polished so much that its base is removed, or its hook is gone, or both! In the early 80’s, I brought competitors dice to Gencon and had gamers who attended my Polyhedrathon lecture, roll the dice 100 times to confirm that they were lopsided. Before the 100 test rolls began, I predicted which numbers would come up most often because they were on the shortest axis, and in every test, the dice performed as I had predicted they would. I am very impressed that Chessex would take the time to conduct a 10,000 test roll and a 1,600 follow up test. Although these tests are not difficult to perform, they eat up an unacceptable amount of time. A U.S. naval cadet rolled my bubble dice 1,000 times to see if the face with the bubble, would come up more often than any of the other dice faces. He reported that there was no sifnifficant difference in the number of times any face came up. For years, people who don’t make dice had been telling me that dice with bubbles will stop more often with the bubble face up, because the bubbled face is missing plastic and therefor lighter. What these authorities on dice manufacturing didn’t know is that every morning, the dice molding tools are cold because they had not been run all night. Consequently, the first few shots made every day, will have bubbles because the plastic which touches the cold mold walls, cools faster than the molten plastic in the middle. Because cold contracts, the molten plastic in the center of the die, gets drawn away from the middle and creates a vacuum bubble. This explains why dice with bubbles weigh exactly the same as dice without bubbles. They both have the same amount of plastic in them!
Thank you so much for the response! It’s an honor to have you posting here. One small correction I should point out is that it was not Chessex who did the test, but us here at Awesome Dice.
Correction. As you know, heat expands and cold contracts. On line 6 of the last letter I sent, I said ” hot plastic takes up less space than cold plastic” I should have said, Cold plastic takes up less space than hot plastic.
Those who responded to your test, seem to have overlooked the fact that l8 faces on the Gamescience die were within 12.5% whereas only 9 faces on the Chessex dice were within the 12.5% (with a .07% variance) Thanks for sharing the results of your research. Lou
Roger SG Sorolla is correct, this is a job for the Chi square goodness of fit test. It’s not your counts that need to be distributed with a chi square distribution (hopefully they’re distributed with a uniform distribution) it’s your test statistic. You’ll find that the Chi-square test of goodness of fit test statistic, if I remember off the top of my head SUM[((O-E)^2)/E) is a sum of squared z scores, so it is distributed with a chi square.
I’ve never heard of the method you’re using and, without seeing more of your process, it seems like you’re oversimplifying things to say that the margin of error on a univariate Confidence interval is 33 so you should reject if any counts are outside of the 500 +- 33 range. If nothing else you’re ignoring your family wise error rate.
We did do a Chi square test, which also confirmed that neither die rolled random (or rather, it was incredibly unlikely to be random, of course).
(Statistician by profession)
Apropos the use of the mean and standard deviation which some people are objecting to, they are useful statistics if one is looking for a particular deviation from the nominal uniform distribution. The SD is a useful statistic to show the fact that the dispersion differs from the nominal sqrt(399/12) one would expect from a uniform over the integers 1 to 20. While there are patterns of numbers that won’t affect the SD they’re fairly unlikely ones compared to more common things. These would both be evidence of bias of particular types, such as something that shifted the mean upwards or increased the dispersion from uniformity (e.g., something that made middle numbers less likely than low or high). Vice versa the SD being too small would also be useful for showing either a high or low bias, for instance, particularly in conjunction with too high or too low a mean. If a die rolled systematically too high, the mean will drift up but the SD will drift down.
The Pearson chi square or similar statistics like the likelihood ratio chi square (aka scaled Kullback Leibler Divergence) are useful for detecting numbers that deviate in an arbitrary pattern. However, they have low power due to what statisticians call a “vague alternative” and thus require the really large sample size used here.
There is nothing wrong with these more focused statistics and indeed they are discussed in industry-standard texts, such as Alan Agresti’s Categorical Data Analysis, now in its 3rd Edition. In many practical problems, including the one here, they have more power and are detecting important deviations from expectations. They won’t detect every possible bias, of course, and might miss the Game Science 14 and 7 thing. While that would cause the mean to shift upwards a bit and the SD to shift downwards a bit, a more direct view of it would come from the frequency table. The cost of a more focused test is that you’re not looking for other patterns.
Am I the only one who doesn’t really like the way precision dice roll at the table? They just don’t roll nice. I’ll concede they probably roll “truer” than other dice. But I actually like the thought that certain dice of mine or lucky. If every dice rolled perfectly true, that would be kind of a bummer. I love to imagine that I have a certain dice that comes up 20 more often. It’s part of the mystery of dice. At a gambling table, yes, I want perfect dice. But at the RPG table, I want the dice to seem to have a mind of their own. I’m not sure why this is. Plus, I like to punish my poorly performing dice by not rolling them. Or by rolling other dice in front of them. If I knew, without a doubt that they were perfectly true, I would feel guilty about punishing them.
Above your Red Chessex bar graph, you stated that the expected result of 500 rolls per face, plus or minus 33 is an acceptable performance outcome. Therein, to conform to the appropriate actuarial accuracy formula for most random outcomes, you should only use ¼ of the 33 base as a yard stick. Because 8.25 are ¼ of 33, only three Chessex faces {5, 9, & 11} meet your randomization criteria. Please, review {the other bar graph as posted} and list the findings for GameScience faces which meet the 8.25 accuracy test criteria?
Then as we at GameScience use a more strict base line for our testing; the results are diminished to plus or minus 25 wherein ¼ is 6.25. Only one Chessex face {5} meets these GMS stricter criteria for statically randomization. Please, list the GameScience dice faces which meet these more stringent criteria?
The plus or minus 33 is the statistical margin of error for each individual face of the die. If the die was, in fact, completely and totally random, we would see results on each face up to 33 above or below 500 with the quantity of rolls that we made.
You would not take 1/4 of that — there is not a “more stringent” requirement. A face that is is within, say 10, is not mathematically more random that one that was within 20 or within 33. They are all as random as 10,000 rolls can measure. Another way to look at is is with a Chi Square test, which shows the odds of the result being random (and that demonstrates that neither die is random, even if you ignore the 7 and 14 from the GameScience die it was not random). On the other hand, both dice are so close that it would literally take over a thousand rolls before you could even see the difference.
We did list the exact quantity of times each face of each die was rolled further down below the charts in the article.
Bah!
Say what you will about randomness…
The Gamescience dice’s sharp edges and brilliant transparent colors catch the light like no other and add a little “fire” to every roll.
…now if they could just make the numbers a little easier to read for my old eyes…
I enjoyed reading about your tests. I assumed my dice were game science since they are sharp, clear translucent colors and the numbers are not colored. All of my dice are from the mid to late 1980’s. I don’t remember any dice having the numbers colored in back then. Everyone used crayon a crayons to fill in the numbers. So I checked my flashing and every twenty sided had the 7 and 14 next to each other, not opposite. In fact the flashing on all was prominent and on the 10 which is opposite of the 20! You can imagine my surprise to learn that for the last thirty years I’ve been rolling dice that were much less likely to land on a twenty. It’s time for new dice!
So if “true” is what you are looking for, why use dice? Just get an app. Or if you don’t have confidence in the apps, use excel. And if you don’t have confidence in the math co-processor of your computer, then ask a stranger on the street to throw a thousand marbles into a subdivided tray and divide by the appropriate number to get a result in the appropriate range.
Or you could pick chits out of a hat, the way we did with the D&D boxed set.
Use excel? There are multiple reports of Windows having problems with “true” randomness, to the point of creating hotfixes to address the issue.
In all honesty, 10,000 rolls is not enough to actually tell how randomized a system is going to be. Maybe trends can be found, but hard to prove. That’s the fun of RNG. With 100 million “rolls”, a perl script (with Math::Random::Secure) had a max deviation of 0.07% ([7] 4,996,409 / 5,000,000). However, this same code with only 10,000 iterations had deviations as high as 7.6% ([10] 538 / 500).
Mathematically, it is possible to tell if a system is not random with 10k rolls (or even considerably less) — if the dice are sufficiently not random.
In the example you give, you observed deviations up to 38 above expected on a 10k roll sample, and in the dice we rolled we saw deviations up to 100 — so we know those aren’t rolling truly random.
For our test we calculated the expected distribution for the number of rolls: results outside the distribution (particularly far outside) suggests dice that don’t roll true. Results within the distribution wouldn’t prove the dice were perfectly random, but that they were as random as that number of rolls could measure.
As a scientist I appreciate the methods employed. As a GM if I get a few bad rooms I just throw in another monster. Problem Solved.
One should be able to account for the impact of the deviation from randomness caused by the 14 face by looking at other faces with the greatest deviation. The opposite face (7) is not one of them, as it is close to fair, which makes sense as the blemish is off to one side of the 14 face. In other words, the effect is going to be lopsided.
I also want to report that I bought a couple hundred GS d10s from Gamestation a few years back, inked, and none of them have a discernable blemish. Would someone there have filed them? Were the molds somehow improved?
And I see that Gamescience appears to be back in business as I am corresponding with them about availability and cost for a couple thousand d10s!
Great thread.
A glaring omission: How did you roll the dice? By hand? Mechanically? Were both dice rolled the same way?
Also, why didn’t you test 10 dice from each manufacturer 1000 times instead of 1 die 10,000 times? Is it possible there was some wear and tear over time that affected the dice? Casinos replace dice regularly for this reason.
Rolling methodology is explained in the article. We did roll a second Chessex and GameScience die to get initial confirmation (1,600 rolls each), also as explained in the article.
We didn’t test 10 dice 1k times because the expected margin of error at 1k is still somewhat high.
I think it should be noted that for the Chessix, a specific band on the dice with the 1 at the top and 20 at the bottom, the center band rolled consistently high, while the ends (1 and 20) rolled far less frequently, including the numbers near the 1 and 20, 7, 13, 19, 2, 8, and 14. Far too uniform for convenience.
Contrarily, the Gamescience Die rolled with a bias against the 14 and a bias towards a series of numbers two spaces away from the 14 on the die. This seems to indicate that if landing on the oppossite side of the seven (14) is unlikely, it ends up rolling not one space away but two. Note that if you remove the 14 from the data, the standard deviation for the Gamescience die is half that of the Chessix die. That is an insane difference and very much worth noting. Assuming that the extra 14s would end up on the high scorers gives us an even lower standard deviation somewhere in the area of a quarter that of the chessix die. This is just speculation, but if you were to properly trim the Gamescience die, your results could be massively better than any other dice.
What really bothers me is that I have other dice which are far more tumbled than Chessix dice. These dice are likely more toroidal than even the chessix dice. I would expect them to roll far outside the normal range and be functionally unusable. If the numbers are that far off, it would be noticeable even in most normal sessions where you may roll 15 or 20 times. Thats a pretty serious difference.