
Ever since the Book of Mormon was published in 1830, critics have tried to show that it came forth as the result of fraud. One of the earliest theories was the Spaulding Theory. As the theory goes, Solomon Spaulding wrote an unpublished novel about a group of Romans from the time of Constantine that were blown off course from Britain to the Americas (check out Part 1 and Part 2). Somehow (never adequately explained) Sidney Rigdon obtained the manuscript, and then transferred it surreptitiously to Joseph Smith who added religious information.
Fawn Brodie put together an appendix in her book No Man Knows My History outlining problems with the theory. (I wrote about this in a post called Debunking the Spaulding Theory.) Most people think the theory has been debunked, though the theory still has some adherents, such as Dale Broadhurst who maintains a website in favor of the theory.
Wordprint studies try to determine the true author of text. The idea of a wordprint is similar to a finger print. Each person uses a certain set of words such as “a, but, and, the, etc” in a way that is unique. By collecting information on word usage, a wordprint theoretically can identify an author.

In 2008, Mathew Jockers, Daniela Witten, and Craig Criddle of Stanford University created a stir when they produced a peer-reviewed article in Oxford’s journal titled Literary and Linguistic Computing. The authors concluded that major portions of the Book of Mormon exhibited Sidney Rigdon and Solomon Spaulding’s writing style, thus creating a resurgence of interest in the Spaulding Theory.
Traditionally, wordprint studies have used a statistical technique known as the Delta Method. Jockers, et al compared the Delta method to a new technique called Nearest Shrunken Centroid (NSC). NSC has been used cancer studies, but this was the first time it has been used in wordprint studies. The Jockers study found the NSC method to be much more reliable than the Delta method. Many New Order Mormons and anti-Mormons were pleased with the study. But there were some big questions about the method.

In January 2011 Bruce Schaalje, Paul Fields, and Matthew Roper of BYU, along with Gregory Snow of Intermountain Health Care released a study outlining problems with the Jockers study in the same Oxford journal of Literary and Lingustic Computing. While acknowledging that NSC is a good method for wordprint studies, they detailed several problems with the Jockers study, noting a “naive application of NSC methodology” led to “misleading results.” Jockers et al had used a closed set of 7 authors for their study. Schaalje’s study showed that an open set of candidate authors “produced dramatically different results from a closed-set NSC analysis.”
I reviewed the Schaalje study in depth on my blog. For those looking for a “Cliff’s Notes” version, let me summarize the strengths of the BYU study over the Stanford study:
- Jockers excluded Joseph Smith as a candidate. Now Jockers noted this weakness, and had valid reasons for excluding Joseph, but this is still a MAJOR problem. Joseph frequently used scribes, even for personal letters and journals. Jockers felt that none of the writing samples could positively be identified as authentically written by Joseph, so they excluded him. I understand the concern, and think Jockers did the right thing, but this is still a massive problem.
- Schaalje’s method gives a “none of the above” above option. Jocker’s method had to pick a winner, even if none of the authors were a good match.
- The “goodness of fit” test. Schaalje creates a method to show Jocker’s conclusions were much weaker than he implied.
- Schaalje’s method was reliability tested against a known author. He did a test on the Federalist Papers including and excluding Alexander Hamilton as a candidate author. Jocker’s methodology picked Rigdon when Hamilton was excluded. Using Schaalje’s open set method, Schaalje’s method picked “none of the above” when Hamilton was excluded. When Hamilton was included, both Jockers and Schaalje’s method correctly picked Hamilton.
- Jockers used too small of sample texts (114 words) for his training set. Jockers noted this as a possible weakness, but Schaalje showed that this was a significant problem.
I have to say that the BYU guys really thought through this problem well. Jockers has plans for an updated study to include Joseph Smith, and others. Judging from the BYU study, I think the Stanford folks have some serious problems. What are your thoughts?
The Stanford Study was rather obviously culled to produce its results. I expect every round of that iteration to reflect similar defenses and culling.
Though it is fun to use their methodology on other sets of writings. 😉
BYU stats pwns Stanford six ways to Sunday. That’s not news, but thanks for the update!
I’m not convinced that wordprinting is a reliable science. I know my personal writing style has changed over the years and after I copyedit something, it looks significantly different than my initial draft. How would my initial draft compare to my final piece? Same author? Different?
Just my $.02…
I don’t know that word printing is going to solve the issue one way or another. There are many, many great authors who adopt different language characteristics for different characters in their stories. So, a single author can have different word prints.
I also don’t buy the “short time writing” argument, either. Just as a simple example, Stephen King generally writes 10 pages per day. But for The Running Man, he wrote the entire 214 page novel in a single week. Granted, it may not have some of the complexities of the Book of Mormon, but it also doesn’t have all of the quotes lifted from the Bible and the “And it came to pass…” lines. So word print or other issues seem weak to me.
Another issue with word print studies has to do with HOW the Book of Mormon came about. If it were a direct translation, where Joseph Smith looked directly at the plates and translated them like we consider translation, then it makes sense that the word prints of different authors would carry through into the English version we have today.
Instead, as various people have pointed out (including Elder Russell M Nelson), for the vast majority of the BofM, JS was NOT even looking at the plates but had his face in a hat with the plates completely covered. There is the theory that specific words were displayed for him on a stone in the hat, which wouldn’t disappear until they were written absolutely correctly by the scribe. But this doesn’t really work either given all of the corrections that had to be made after transcription (as pointed out in the great effort by Skousen).
So, all we can conclude is that JS was inspired by the plates. He expressed divine thoughts in his dialect of KJV English. When ideas in the KJV version of the Bible were “close enough” he just quoted them directly.
A word print study would likely match either Joseph Smith himself, or possibly the KJV “authors”.
Frecklefoot, the BYU guys were aware the writing styles change, and also compared early Sidney with Late Sidney.
Jockers et al had a footnote noting that Stanley Fish does not view wordprints as reliable, so skepticism is understandable. Once of the interesting things to me from the Schaalje study was the test on the Federalist Papers. While Schaalje noted some false-positives even with his test, he showed his new method was much better and more reliable. He is still working on ways to cut down on the false positives. Still, I was impressed with the Federalist Papers exercises. Such reliability testing is crucial for skepticism to be diminished, IMO, and I think that was a valuable contribution from the BYU professors.
Hi MH,
It’s been a long time since I’ve ventured into debating much of Mormon history and I’m not really sure why your post induced me to add a comment to this one. So as this is my first post on Wheat and Tares, I hope you will be kind.
I actually followed the debate on MADB between Dale Broadhurst (Uncle Dale) and Bruce Schaalje. To be fair, I felt like both of these gentlemen made good points. I don’t believe either side won the debate, but by the same token, neither side lost. Therefore, as much as members would like to believe the Jockers study has now been completely debunked, that’s not intellectually honest.
Mr. Schaalje used vector analyses to show variances in writing styles among the chapters to show the problems with the Jockers study. As Mr. Broadhurst correctly pointed out, these variances do show that Jockers has some problems, but Bruce’s analyses raises even more significant problems for the official story of the way the BoM was put together. For example, according to Bruce’s vectors, there are large variances between chapters in Alma that don’t match each other, but cluster up with writings from 1st and 2nd Nephi. According to Mormon, the small plates of Nephi were included unedited by him with his abridgement of the large plates of Nephi. (Those pesky 116 pages keep getting us in trouble.)Similar problems can be found with matching styles between Mormon’s abridgements and Chapters in Ether and Moroni, which again were not abridged by him.
To your point, the Jockers study is in need of much more work. The small group of authors is problematic and we haven’t actually seen the updated study with Joseph Smith included as a possible author. However, as you also pointed out, the use of word prints using the NSC method is valid in determining writing styles and authors if the study is properly controlled. To sum up then, the rebuttal of the Jockers study has actually opened a completely new can of worms for members in trying to sort out who wrote what in the BoM. Both Jockers and Bruce’s analyses show matching writing styles among the small plates of Nephi, large plates of Nephi, Ether, and Moroni. I don’t believe Jockers intended to compare styles between Jacob and Alma or Nephi and Moroni, but that does appear to be a valid proposition for the word print study in determining if the alleged plates actually had different authors for the four major sections of the BoM.
On a side note, do you believe there would be any value in running a word print study on Christopher Nemelka “The Sealed Portion”? It might be interesting to see how that turned out… 🙂
If Jockers et. al. write a short rebuttal, then their study will already be cited twice (once by themselves and once by BYU) after one year, which will be above average for the journal. In addition, since they will have 2 articles on the subject and BYU will have one, they will be the NSC wordprint champions. If more people pitch in, they will be cited even more. If they adjust their method quickly and get that published, they will be the inventors and leaders in using the method.
If they are “more computational” than their colleagues in their department, the ones that don’t get math will give them a pass.
All in all, it is a win for the Stanford folks from the academic point of view.
Oh wait a minute, what is academics for?
Doug, so glad to see you comment! I was hoping you would see this topic, and I’m glad to see you’re following W&T after all! (This does seem to be a topic you had considerable interest in before.) I have heard rumors that Jockers included Joseph Smith in a new study, and that JS made little difference in the closed-set analysis. But I will be curious to see Jockers apply the goodness of fit test to Rigdon/Spaulding, and will be curious what the Stanford folks do with the open-set methods, which seem to have greatly different results.
Do you have a link to the MADB discussion? I’d like to see some more details. I don’t think I can comment intelligently without a bit more information on this vector analysis. (Was that Principal Components Analysis or Vector analysis, or a combination? The paper specifies PCA.)
I will say that in discussions with Dale in the past, it seems to me that he wants to attribute individual chapters to specific authors. As stated earlier, there is a bit of healthy skepticism of wordprints, and I don’t know that Dale’s approach is without problems. The false positive problem is still an issue, and Schaalje plans further work to address that issue, but IMO he has made huge progress over the Jockers study. Schaalje also mentioned the problem of too small of samples. There is a debate about whether the modern LDS chapter divisions are valid divisions (both Jockers and Schaalje used modern LDS chapters). I don’t know how much Dale has addressed that issue, or if he views it as not an issue.
Certainly wordprints can’t solve all issues. Dale’s research sounds interesting.
I don’t believe Jockers intended to compare styles between Jacob and Alma or Nephi and Moroni, but that does appear to be a valid proposition for the word print study in determining if the alleged plates actually had different authors for the four major sections of the BoM.
Well, as I mentioned in my last post on Dueling Wordprint Studies, Terryl Givens has noted that BYU has already done a considerable amount of work “compar[ing] styles between Jacob and Alma or Nephi and Moroni”, and have concluded many different authors contributed to these books.
I don’t know Charles Nemelka. Can you expound?
The whole study seems mis-guided and mis-understood. It insinuates a google-translate type translation of the Book of Mormon took place book by book, where the text from Nephi or another book was entered on the Chaldean, Arabic, Reformed Egyptian or Assyric side; and, English was spit out the other side. That is not how it happened.
It is my understanding; Joseph used the seer stone to verbalize the concepts, which were then recorded by a scribe. With this said, the writing style of the Book of Mormon (the entire book) would take on the writing style of the scribe; and, not necessarily the writing style of the Author. The 1830 has been modified. There are approximately 3,900 changes from the 1830 to the 1953 version. The lion’s share of the changes are grammatical, but some are conceptual. Most notably, God was replaced by Son of God in some places. I have personally compared the 1830 version line by line to the 1953 version.
Too me all of this is irrelevant. The ONLY way one can come to knowledge of its truthfulness is by applying the promise in the book. I have received that witness. I know it is true and nothing will change that knowledge.
MH:
You can read about Nemelka here, or by doing a google search of his name.
MH,
Apparently the discussion between Uncle Dale and Bruce wasn’t worth saving after MADB made the transition to their new broad with different rules. (At least I don’t seem to be able to find it.) I can’t really answer your question about vector or PCA, perhaps you will have better luck than I at searching that broad. If it was vector and is now PCA, Bruce may have taken Uncle Dales criticisms to heart and changed his approach to Jockers…
I agree with Matt Jockers sentiment stated in a recent thread on MADB. To summarize, he felt like Professor Schaalje had brought some very interesting things out and that his research would further the science. He also stated that he didn’t believe the paper refuted his study, but did help clarify it. Matt sticks by his original premise that with the given 7 authors (closed set), Rigdon/Spaulding is the most probable writers of the Book. Given a naturalistic explanation for the BoM, I can think of a few other authors that should be in the mix, but with Joseph Smith included, that group of 8 hits most the likely suspects minus the control authors.
I think we can all agree with the researchers that if the actual author is in the mix of possible authors, then the NSC method is actually very reliable. Having said that, if the author isn’t in the sample group, the resulting data is not useful. We went round and round last time about the very thing that Will brings up in Comment #9. Given the church’s official stand that the BoM was translated by the gift and power of God and that many times the plates didn’t even need to be in the same room, the only word print style that should be in the book is Joseph’s. After all, no matter how you want to explain the translation process, Joseph had to take what was written on no more than 40 sheets of metal and translate it into over 630 pages of English text. In my opinion, that would wipe out any word print style from whoever originally scribed on the plates. (Given the required compaction of the data, it would seem doubtful that the original authors would have even wasted space with the kinds of meaningless words the study is programmed to look for.) That’s really the big problem here. Every study I’ve seen seems to agree on one thing. Joseph Smith didn’t write it!
As for the “Sealed Portion”, let me refer you to a Mormon Matters discussion on this feller.
http://mormonmatters.org/2009/09/13/have-you-read-the-sealed-portion-of-the-book-of-mormon-yet/
I bring it up because I’m curious what the NSC method would show with the 8 candidate authors and Mr. Nemelka word prints in the machine. I would be a good test as his book reads a lot like the Book of Mormon.
One last thing… If I understand you correctly, you don’t believe we can word print an author like Alma because it would appear too many authors are present in the text. While I agree that makes word printing Alma’s style difficult, it also raises serious issues for me on the veracity of the book. Shouldn’t the individual authors be identifiable? If they’re not, how can you trust what they’re saying is accurate? In other words, if President Monson gave a talk in conference and then three or four editors went to work changing his speech enough to get their word style in the text, you wouldn’t trust what came out. Why is this different for the BoM? It just doesn’t make sense to me. Then again, perhaps I’m just not inspired enough to believe anymore. 🙂
All the best MH!
Matt sticks by his original premise that with the given 7 authors (closed set), Rigdon/Spaulding is the most probable writers of the Book.
Yes, that is what his study has concluded, so Jockers is right on that point, but it’s a weak point to stand on. Schaalje has not only shown the flaws of a closed-set method, and introduced a much better open-set method, but showed that Jockers didn’t apply a goodness of fit test. If Jockers had applied a goodness of fit test (sort of like a confidence interval), he would have seen a problem with stating
In a discussion at Mormon Discussions, Jockers stated
Huh? What was the study about then?
As for Nemelka, thanks for the link. For some reason, I don’t remember that post at all. But I agree–it would be interesting to see what NSC does for the lost 116 pages.
If I understand you correctly, you don’t believe we can word print an author like Alma because it would appear too many authors are present in the text. While I agree that makes word printing Alma’s style difficult, it also raises serious issues for me on the veracity of the book.
I neither agree nor disagree with that statement–it’s not the point I was trying to make. (Certainly multiple authors would make this problem more difficult.) Here’s what I’m trying to get at.
Both Jockers and Schaalje have noted a problem with false-positives. Jockers had false attributes of Longfellow to Isaiah, and Schaalje had false positives in the Federalist Papers exercise. Wordprints have been shown to be unreliable in both of these studies for certain applications. The techniques aren’t anywhere close to DNA evidence, so I would be wary to assign individual chapters to authors as Dale is trying to do.
It is as if Dale is taking a Nike footprint and assigning it to Spaulding. I know Dale thinks his forger wore a Nike. It seems to me that Schaalje is saying that the shoe size for Spaulding is wrong on the Jockers study even if Spaulding wore a Nike shoe. (I almost want to quote OJ’s lawyer here–if it don’t fit, you must acquit.) I’m pretty sure Dale isn’t using NSC or Delta method for his study–I’m not sure what he’s doing exactly because I haven’t studied it in depth. But I do believe that both Jockers and Schaalje would be very wary of using NSC or the Delta Method to positively identify individual chapters to certain authors. I think it is a misuse of the method, and these chapters must be aggregated into the entire book to make any conclusions. If Jocker’s conclusions held up that 229 of 239 chapters were written by Rigdon/Spaulding, then that would be pretty darn impressive. The problem is that Schaalje showed the closed-set method is subject to wildly inflated numbers with the Federalist Papers demonstration. Dale seems to think he can assign chapters to different authors. I’m skeptical.
Your point about Nemelka is an interesting proposition. I reviewed the website very briefly. Does he claim to be the sole translator, or did he have scribes like Joseph (Martin Harris, Emma, Oliver, etc)? I wonder how similar Nemelka and Smith’s translation processes were.
This suggestion is precisely the problem with wordprints in general. Wordprints aren’t designed to tackle this problem. If such a test were performed, I doubt any wordprint would be able to identify the paragraphs edited by the editors. It is a misuse of the technique to believe that these wordprints are anywhere near accurate enough to identify the editors. If Dale thinks he can do this, but I’d be very skeptical. You’re taking the technique too far if you think the editors can be identified.
The Book of Omni has 5 authors in its 31 verses–one of those authors is the author of 1 verse (see verse 9-Chemish). We have 2 possible assumptions here: the book was written by 5 people, or the book was written by 1 person. If we assume that the book was written by 5 people, then no wordprint will ever identify Chemish as a legitimate author when he wrote just 69 words. The training set needs to be more than 1000, and Jocker’s study measures the frequency of 95 words. Chemish’s contribution to the BoM is just too small. Neither Jockers of Schaalje’s technique is sensitive enough to test this.
If we believe the book was written by 1 person, then I’d be curious to see the open-set vs closed-set results, and how well the goodness of fit is for the book. Schaalje’s test seemed pretty good with Alexander Hamilton, but still had false positives when Hamilton was excluded. If the goodness of fit test for this book came up for Spaulding, that would be interesting, but far from a smoking gun.
I always wonder that when people come up with new kinds of ways of calculating things, why don’t they test their method first. Why didn’t Jockers et al. do a test with some known paper first? Maybe they did but they just didn’t publish it. I guess they trusted their method. That’s ok, the method is good, problem was with the “naive application”.
This is something I’ve seen in academic world many times. Is it because new results will get more attention?
#9 Will
Given that the BofM was not “translated” as you mentioned (with which I agree according to the various accounts), it appears that JS are primarily inspired as to what to write. This is certainly a valid way to bring things forth, as prophets have done for millennia.
In this case, what purpose you you think the plates actually served, if they weren’t actually used for translation? A talisman? Or something else?
Given how much the Book of Mormon quotes from, or follows the literary style of, the King James Bible, shouldn’t wordprint studies include some of the known KJV contributors as potential candidates? I understand that the KJV borrows heavily from Miles Coverdale’s translation, for instance.
yes you are correct. in the longer version of my post, I mentioned that both jockers and schaalje correct identified 20 of 21 isaiah-malachi chapters. however, jockers falsely attributed 1 isaiah chapter to the poet longfellow. hence there are problems with false positives.
Listen, my children, and you shall hear,
Of the midnight ride of Maher-shalal-hash-baz.
Could happen.
Mike,
I think the main purpose of the plates was to keep skin in the game. It was to keep the 11 witnesses and Emma on board with the restoration. They suffered extreme persuctionand without some evidence I think the whole process may have been undermined. Also, they did serve as a guide in the translation process.
My great, great grand-father interviewed Emma several times prior to her death. He was the president of the Eastern States mission and was sent to Nauvoo by Brigham Young. I have his hand written letters, and Brigham’s hand written responses. In these letters it was clear she was exhausted by the whole process and wanted little to do with any religion, including the one founded by her husband, or by David or Joseph III.
MH,
I appreciate all the work you put into the response for my questions! I believe we are in agreement for the most part. While the Jockers study fell short settling the authorship question, the paper itself opened up what I believe to be a valuable investigation into the origins of the book. Of course, for those convinced that the book is of ancient authorship, the study is not very valuable as Bruce pointed out. On the other hand, for those that believe the book is a product of the 19th century, the Jockers study could be enhanced with just a few more possible authors to help us non-supernatural types better understand who most likely penned it. I believe the addition of Joseph Smith, (supposedly has been done) Ethan Smith (as you suggested), Lucy Mack Smith, and Emma Smith would round out the possible contributors. Perhaps one day someone will do that…
“Both Jockers and Schaalje have noted a problem with false-positives. Jockers had false attributes of Longfellow to Isaiah, and Schaalje had false positives in the Federalist Papers exercise. Wordprints have been shown to be unreliable in both of these studies for certain applications. The techniques aren’t anywhere close to DNA evidence, so I would be wary to assign individual chapters to authors as Dale is trying to do. “
Agreed, the smaller the sample, the greater the chance for false positives. Therefore, individual chapters are difficult to quantify this way. However, large books such as Mosiah, Alma, 3rd Nephi, and Ether should be rather simple to establish an “authorship footprint”. If we then find near perfect matches with “footprints” in 1st and 2 Nephi, Jacob, or Moroni, one may be persuaded to believe that the same person wrote both texts. I think that was the only point Uncle Dale was trying to make with Bruce S. Not that he felt individual chapters could be positively identified to any one particular author. As I stated before, Dale looked at Bruce’s charts and notices things like Alma 48-51 which didn’t cluster with the rest of the book of Alma, but did cluster with one of the other books that wasn’t abridged by Mormon.
At the end of the day you’ll get no argument from me about Jockers. While I admit to being somewhat disappointed with the way it all played out, it’s still fascinating to study…
To the extent that word-printing has any value at all — and it’s not clear to me that there is any value or even any potential value — the ways it’s been used to examine the Book of Mormon have been unfortunate and unhelpful. I don’t think the new study BYU has any value except as a tool to undermine the credibility of the Stanford study. Both studies should be dismissed.
john, while I understand the skepticism of wordprints, I am a bit surprised that you want to dismiss both studies. are you saying that wordprints just simply aren’t valid tools, or are the 2 studies aren’t of any value?
I was going to ask John the same question, because I suspect he’s decided the issue of BofM historicity in favor of non-historicity on the historical evidence and sees no interest in addressing the issue of faith claims on any other basis than faith.
john, let me ask another question related to a wordprint. I heard you discuss the book of deuteronomy. many scholars have noted a very different word style compared to the other books of moses. I think this could be a form of wordprint, though not nearly as complicated as nsc or delta method. many have concluded (as I think you do) that deuteronomy was written by a different author. are you saying that wordprints are without value?
The problems with Deuteronomy have not been established via statistical nonsense calibrated to legitimize the biases of the people who input the criteria (i.e., “wordprinting”). Deuteronomy is replete with anachronisms and the use of a chronologically later variant of Hebrew than the much earlier component texts of Genesis, Exodus, Leviticus, and Numbers. Deuteronomy’s status as a forgery has been established by legitimate literary analysis, e.g., noting that certain words that exist in the text were introduced into Hebrew long after the source texts of the other four books of the Pentateuch were composed.
Yes, I’m saying that computerized wordprints are without value.
Firetag is right: The problem with this debate is that the battle is being fought on multiple fronts. People who believe on the basis of faith that the Book of Mormon is an ancient text are the most numerous participants in the debate, but they have no place in the debate. And I presume that the BYU folks are among this crowd.
The Stanford survey was irresponsible because it applies a pseudo-science to a question that lacks a feasible historical possibility. In other words, the Standford folks might have questions whether William Tyndale wrote the Book of Mormon, input a bunch of parameters into a computer, and gotten the result that there is a marked similarity between the Book of Mormon language (which is clearly modeled on the King James Bible) to Tynsdale’s writing (given that Tynsdale was a major source of the KJV). That analysis would be meaningless because there is zero chance in a historical sense that Tynsdale wrote the Book of Mormon. Applying “wordprinting” to this question is absurd.
Likewise, for the faithful who believe in the Book of Mormon’s antiquity, applying wordprinting to prove the Book of Mormon is ancient has the same value as “proving” with computer analysis that Deuteronomy was written by Moses or the Donation of Constantine was written by Constantine. We already know that these are not the case.
Ah, John. The physical sciences aren’t quite certain that ANY of the social sciences are more than pseudo-sciences. 😀 Be kind.