Theological Data Mining

Saturday, July 5, 2014

Data Sources: Which Books Belong in the Bible?

Last weekend I saw the Book of Mormon musical and it reminded me of a joke I heard on a Jewish radio show: "Why did God create Mormons?" ... "So that Christians could understand how Jews feel." The joke implies that the New Testament is analogous to the Book of Mormon, which most Christians reject. Ironically, modern Judaism has its own "new testament", the Talmud, which also is analogous in some ways. In fact, many of the differences between various religions can be attributed to differences in the holy books they consider authoritative (i.e., their "bible" canon).

So which books belong in the Bible?

I think that is the wrong question to ask, and I think it comes from the natural (but often irrational) human desire for certainty, facilitated by concrete, black-and-white category distinctions. The problem is that even if the writings themselves are divinely inspired, inerrant, and infallible, our ability to recognize and classify them as such is not. Thus, instead of regarding particular books as part of an authoritative canon, I think it's more useful to regard all of them as data sources and treat them as such.

Treating them like any other data sources, my answer to "Which books belong in the Bible?" is "as many as can practically fit". That could include the Tanach ("Old Testament"), Apocrypha, New Testament, Talmud, Gnostic writings, Qur'an, Book of Mormon, Bhagavad Gita, Tripitaka, and many others. It includes some that are very accurate and useful, some that are spurious and useless, and some that are largely unreliable yet contain a few useful data points. In other words, it's a lot like the data sources used by scientists (e.g., for things like weather prediction).

Data doesn't have to be perfect to be useful, especially for probabilistic beliefs. Even datasets that partially contradict each other can have value. For example, the New Testament, Talmud, and Qur'an all agree on some things and disagree on others. Thus, we can have a relatively high level of confidence in beliefs & doctrines on which they all agree, and lower confidence in beliefs where they contradict each other. Of course, that in no way implies that they are equally true or should be given equal weight. Not at all.

It's impossible to read every book ever written about God, many of which contain mostly noise. My solution, as with other types of data, is to start with those that are the most accurate (according to history & archaeology), ancient, widely accepted, and relevant, then add more. Using my estimations, that usually means starting with the Torah and Nevi'im ("Law and Prophets" -- which also apparently were Jesus' primary written data sources). They are the most widely accepted and ancient, and they make claims of divine inspiration that can be scientifically tested. If there's room for more data, I then add the Ketuvim ("Writings"), Apocrypha, New Testament, and Mishnah. Then the Essene writings, Jewish Pseudepigrapha, early Jewish writers (Philo, Josephus, Targumim, etc.), Ante-Nicene Fathers, and Gemara. Beyond those, I think the data gets very noisy but still has value in some cases.

This methodology is much different from that of many Christians, Jews, and Muslims, who derive much of their theology from the more recent and less widely accepted books, then interpret (and sometimes translate!!) the Torah and Prophets through the lens of those. That method may still lead to correct theology, but I find it less justifiable from a scientific perspective, and it has a tendency toward circular reasoning.

Thinking of the "Bible" as a collection of data sources also illustrates how unreasonable and unscientific some of the objections to it are. For example, many arguments against belief in God focus on alleged errors and contradictions in the Bible, usually about very insigificant details. Others make a big deal about the rejection of certain non-canonical books and the fact that some canonical books weren't accepted until long after they were supposedly written. Assuming those objections are valid (which is debatable), they're basically equivalent to "a few data points aren't perfect, some of the data contains noise, and you threw out a few data points that maybe you should've kept". In other words, it's like practically every other dataset that scientists rely on.

Saturday, May 17, 2014

Faith and the Overconfidence Effect

I once heard that if you're not 100% sure (about God), you're 100% lost. I couldn't disagree more. I would rephrase it like this: If you are 100% sure, you don't have faith.

Faith does not mean "belief without evidence", but it does imply uncertainty. A good definition of faith is "confident trust despite uncertainty". If you're 100% sure because you have absolute knowledge, you have no need for faith. If you're 100% sure but don't have absolute knowledge, you're self-deluded. If you're just a little less than 100% sure, you're probably under the influence of a pernicious cognitive bias: underestimation of uncertainty, also known as the Overconfidence Effect.

The Overconfidence Effect is a pervasive and well-documented human bias where the level of certainty in one's beliefs is usually much higher than the accuracy of those beliefs. It has been studied by asking people to answer questions (e.g., the spelling of difficult words) and then asking how sure they are that each answer is correct. Those studies found that when people were "100% sure", they were wrong approximately 20% of the time. When 99% sure, they were wrong 40% of the time, and when 90% sure, they were wrong approximately 50% of the time. That should put human certainty into perspective!

I think even 90% is unreasonable for theological beliefs, despite the high certainty that so many believers and atheists seem to have. 90% certainty implies 90% probability. Starting with the principle of indifference, a 90% probability that God exists (or doesn't exist) would require very strong evidence. Though I think there is solid evidence for the God of the Bible, I haven't seen enough for 90% certainty either way. Certainty that a particular religion or systematic theology is the "correct one" would require that plus a lot more. I haven't seen it yet, but that's no reason not to have faith in whichever of the available possibilities is most probable according to the evidence we do have.

I've heard many times, "If God wants us to believe in him, why didn't he give us more evidence?" I think the question totally misunderstands faith and the Bible's message about it. Faith does not mean believing God exists. The God of the Bible really didn't seem to care that people believed he existed. What mattered was whether they trusted his promises and lived in a way that reflected confidence (despite uncertainty) that he would be faithful to those promises. That's very different from the alternatives, such as the faith of atheism (i.e., living in a way that reflects confidence that there is no God and thus no divine promises to be fulfilled). If there was sufficient evidence (or philosophical arguments) to know without any doubt, such choices would mean very little.

Whether we admit it or not, we all have faith because we all make decisions amid uncertainty. Uncertainty isn't such a bad thing. It makes us more humble about our beliefs and more respectful of the beliefs of others, which makes us more open to the truth, in the (very likely) case that we aren't totally correct about everything we believe. Uncertainty also makes faith a lot more meaningful.

Sunday, April 27, 2014

Extraordinary Claims and the Principle of Indifference

You've probably heard the saying "Extraordinary claims require extraordinary evidence." It's the starting point for perhaps the most common argument by atheists: "The existence of God is an extraordinary claim that lacks extraordinary evidence." Seems logical, right? The only problem is, how can we determine whether the evidence (or the claim) really is extraordinary?

One common definition of extraordinary is "very unusual". But the claim that God exists isn't unusual. By that definition, "There is no God" would be a more extraordinary claim. But that standard doesn't always make sense. For example, someone could make the very unusual claim that I'm currently wearing three socks, but most people wouldn't require extraordinary evidence to be convinced. Another common meaning of "extraordinary" is "very remarkable or amazing". That one brings us right back to the original problem: How can we determine how remarkable or amazing a claim is? There are other definitions of "extraordinary" but all are similarly problematic.

A much more scientific way to formulate "Extraordinary claims require extraordinary evidence" is via Bayes' theorem. In Bayesian terms, an extraordinary claim is a hypothesis with a very low prior probability (e.g., “a coin flipped 5 times will land on tails every time”, which has a prior probability of around 3%). It follows that very strong evidence is required to move the probability high enough to believe the claim. Thus, it can be shown mathematically that extraordinary claims (defined this way) do in fact require extraordinary evidence. In the above example, that evidence could be a measurement that the coin's weight is very unbalanced or an observation that it has tails on both sides.

Applying that framework to the God claim, the strength of evidence required depends on a priori assumptions about the prior probability that God exists. Theists who start with a relatively high prior probability require less evidence. Atheists who start with a low prior require more evidence. Arguments about the sufficiency of the evidence for God become circular on both sides. Thus, it's imperative that we have a good, objective way to determine the prior probability.

Because we don't have specific, definite probabilistic information about the God question, we must use an uninformative prior. The simplest and probably most common of these is the principle of indifference, which says the prior probabilities of all hypotheses are equal. In the binary case of “Does God exist?”, the prior is 50%. Starting with a 50% probability may seem crazy if the claim seems ridiculous, but it makes good sense mathematically. The evidence (or lack thereof) is probably what makes such claims seem ridiculous in the first place, and the other terms in Bayes' rule account for that. Also, if the claim seems ridiculous to most people, that fact alone is evidence that would reduce the probability.

Using the principle of indifference, presuppositions about the probability of God's existence are eliminated as determining factors. The estimate of the probability that God exists now depends entirely on the evidence. In this case, “Extraordinary claims require extraordinary evidence” is a meaningless argument. It doesn't matter how extraordinary the claim is because the evidence will tell us whether to believe it. We'll still argue about the evidence and how to assign probabilities to it, but that's a lot more useful than debating a theist's circular argument vs. an atheist's circular argument.

There are other ways to determine uninformative priors, including some that let us use the “extraordinary claims” standard. But when applied to the God claim, they generally require arbitrary assumptions that lead to self-fulfilling conclusions. That might be good enough for testing the claim that I'm wearing three socks right now, but whether or not to believe in God is a much more important question – one that I don't think should be decided (either way) by arbitrary assumptions made before examining the evidence.

Saturday, April 12, 2014

Free Will in the Bible: Overfitting + Confirmation Bias

A major theme of this blog is that we shouldn't force data to answer questions it doesn't actually answer. Overfitting and confirmation bias can have an insidious synergy. I believe the debate over free will in the Bible is one such example. Before I discuss it, I need to define it, because there are two types of free will that people often confuse:

Free will in the legal sense: freedom to make voluntary choices without coercion. In other words, freedom to choose what we want to choose.
Free will in the philosophical sense: the ability to make choices that aren't determined by prior causes. In other words, what we want to choose might be influenced by God, genetics, environment, etc. but aren't completely determined by them.

Another important term is "determinism", which is the idea that all events are caused by prior events or conditions.

Despite some caricatures I've heard, practically everyone agrees that we have free will in the legal sense, so when I say "free will" without a qualifier I'm referring to the philosophical sense. There are 4 main philosophical views of free will:

Hard Determinism: everything happens as a result of what happened before it. Free will is impossible because what we want to choose is determined by prior events & conditions.
Libertarianism: the universe is not deterministic. If it was, we wouldn't have free will. It is possible to make choices that are not determined by prior events & conditions.
Compatibilism: the universe is deterministic but we have free will. Free will only makes sense using the legal definition and it's pointless to talk about philosophical free will.
Hard Incompatibilism: whether the universe is deterministic or not, we wouldn't have free will either way.

People have debated free will for millennia and Bible-believers are no exception. According to Josephus, first century Jews were divided about it. The Essenes were hard determinists who believed that everything was determined by divine fate. The Sadducees were libertarians who denied divine fate and affirmed free will. The Pharisees' view was most similar to Compatibilism. They believed in divine fate for world events but also affirmed free will, particularly in spiritual matters, and their definition of it was more like the legal sense.

Many Christians today are either compatibilist (i.e., Calvinists) or libertarian (i.e., Arminians). Thanks to confirmation bias, it's not surprising that both believe the Bible clearly teaches their view. As readers of this blog might've guessed, I don't believe the Bible writers tried to settle this philosophical debate, so any such interpretation is overfitting. What the Bible does clearly teach is that at least some events are pre-ordained by God and that we make free choices (i.e., we have free will in the legal sense). Those teachings are consistent with all 4 views. Attempts at Bible interpretation on the topic of philosophical free will quickly abandon the original context and inevitably enter the realm of philosophy.

My biases make Hard Determinism (and Compatibilism, which I think is Hard Determinism but afraid to admit it) very attractive to me. Weather is deterministic, and I like to think everything behaves similarly to weather -- maybe because it makes me feel like I have expertise in areas in which I really don't. I see a lot of beauty in deterministic systems, and Chaos Theory provides an excellent answer for why some things appear random or "free". Hard Determinism also is an attractive solution to the problem of evil. If God causes evil, it means evil has a purpose -- a greater good. God doesn't helplessly watch, wishing things were different. Hard Determinism allows for truly divine miracles that don't violate the fundamental laws of nature, demonstrating harmonious consistency in God's interaction with the world. Biological evolution also fits very nicely. And I can feel good when reading the many Bible passages that clearly imply determinism.

I do believe it's the view that is most consistent with the Bible (sorry, Arminian friends), but I must admit that my view is totally based on philosophy, science, and personal bias, not the Bible (sorry, Calvinist friends). What makes me doubt my view is not Libertarian proof texts in the Bible (I have answers for all of them, though not without confirmation bias). What really gives me doubt is quantum mechanics. The more I learn about it, the more I see Hard Incompatibilism and Libertarianism as interesting possibilities.

It's fun to talk about the free will debate as it relates to the God of the Bible. I think the debate would be a better one if we all could admit that it is in fact a philosophical (and perhaps scientific) debate -- one in which the writers of the Bible were not participating.

Sunday, March 30, 2014

The Religiosity of Bigfoot Believers

According to a Gallup survey of over 1700 random people, approximately 17% of the U.S. population believes that creatures such as Bigfoot and the Loch Ness Monster will eventually be discovered by science. Belief in Bigfoot is an interesting way to look at religion, because it is essentially independent of any religious teachings. Bigfoot isn't supernatural. People generally don't believe in Bigfoot because of their religion, and they don't choose their religion according to their belief in Bigfoot.

Compared to people who don't believe in Bigfoot (who I'll call "non-believers"), Bigfoot believers tend to have slightly lower income, slightly less education, and slightly more liberal political ideology, but the differences are fairly small. As one would expect, people who believe in Bigfoot (blue bars) are much more likely than Bigfoot non-believers (tan bars) to believe in a wide variety of things, including some that are supernatural.

Thus, one might also expect that Bigfoot believers would be more likely to believe in God and to be more religious in general. That's partly true. 91% of Bigfoot believers believe in God, compared to 87% of Bigfoot non-believers. They also are slightly more likely to believe in Heaven, Hell, angels, demons, and that Jesus is the Son of God. However, according to a wide variety of metrics, Bigfoot believers are substantially less religiously devout than Bigfoot non-believers.

Despite being slightly more likely to believe that God exists and that Jesus is his son, Bigfoot believers are substantially less likely to identify as Bible believing, born again, evangelical, and fundamentalist than people who don't believe in Bigfoot. They attend religious services, religious education, and prayer meetings less often. They also pray and read religious texts substantially less often than people who don't believe in Bigfoot.

Some people say that religious people are religious because they are gullible and willing to believe things for which there is no compelling scientific evidence. Whether that's true or not may depend on whether Bigfoot is real.

Sunday, March 16, 2014

The Gospel According to a Map

Has the messiah come? Christians say yes, and they usually use the gospels to "prove" it, showing that Jesus fulfilled messianic prophecies. But many of those were about relatively minor details (e.g., where he'd be born) that could've been fabricated by the gospel writers. There were, however, much bigger messianic prophecies, and we can verify them without using holy books of any religion.

Back around 730 BC, God's people were divided: Israel in the north, Judah in the south. Israel was recently conquered and exiled by Assyria, and Judah was headed toward a similar fate via Babylon. Only a few people in Judah believed in God, even fewer in Israel, and practically nobody in the rest of the world. Other nations didn't care about Israel's God, because each one had its own gods. Israel and Judah were reviled and were essentially irrelevant in the world.

The prophet Isaiah offered hope to his people by telling them a king ("messiah") would come and establish a "kingdom" of unprecedented size and strength. Other nations would become followers of Israel's messiah and he would be a moral authority to them (Isa. 2:3). Through him, "the earth would be filled with the knowledge of Israel's God, as the waters cover the sea" (Isa. 11:9). Similar predictions were echoed by other prophets over the next couple centuries, but to no avail. Jerusalem was destroyed in 586 BC and its people exiled. The region was later conquered by Cyrus (Persians) in 539, Alexander the Great (Greeks) in 332, and finally by Pompey (Romans) in 63 BC. It probably seemed like the biblical prophecies would never be fulfilled.

Over 2000 years later, the world looks a lot different. Most of the popular gods of the ancient world (e.g., Baal, Dagon, El, Molech, Asherah, Osiris, Isis, Chemosh, Hadad, Artemis, Zeus, and Caesar) are no longer worshipped. Others are mostly confined to particular regions. But there is one glaring exception. According to polls by Pew Research, the majority of people in the world (55%, including 81% outside of China & India) are, at least nominally, followers of the God of Israel. The following map shows countries (in blue) where the majority of the adult population professes Judaism, Christianity, or Islam as their religion.

As someone who makes over 600,000 weather predictions every day, I know well that correct predictions aren't necessarily evidence of divine revelation. But I also know that consistently accurate predictions always are based on analysis of past data, accurate assessment of current conditions and/or recent trends, a correct understanding of how the universe works, or some combination thereof. These explain how meteorologists can make (somewhat) accurate predictions of future weather, futurists and science fiction writers can predict future inventions, and political analysts can (sometimes) predict the next president. But they don't explain the messianic prophecies.

There is nothing to suggest that the unprecedented events that were prophesied were logical inferences from the available data at the time. Quite the contrary! The data pointed much more toward Israel and Judah being destroyed like most of their neighbors and their God ending up like Baal, Chemosh, Asherah, and the others, as minor footnotes in history.

We don't need to take the Bible's word for it. These are well-attested historical facts, as is the fact that the prophetic books were written long before Jesus was born. It's possible that the messianic prophecies were extremely lucky guesses. Verification of them is not proof of messiahship or divine revelation, and parts of them have not yet been completely fulfilled. But if "evidence" means a body of facts that is more probable if the hypothesis is true than if it is not true, I consider it strong evidence.

Saturday, March 8, 2014

Interpreting the Hebrew Bible with Artificial Intelligence

I often hear the question "Do you interpret the Bible literally or figuratively?" The answer is "both" and "neither", mostly "neither". The Bible contains different genres of writing, which should be interpreted accordingly. They include history, prophecy, poetry/songs, stories, and wisdom literature, to name just a few. Identifying the genre is important, but it can be very subjective. It's also difficult without understanding the original language. Those problems can be solved with machine learning.

I developed an algorithm to interpret the Bible in its original language. I started by writing a Perl script that parses the BHS Hebrew text, removes vowel points, and identifies every word used at least 50 times in the Bible. I also removed stop words (i.e., common irrelevant words such as ani [I], at/atah [you], mah [what], etc.). Keep in mind that in Hebrew some articles & prepositions are prefixes rather than distinct words (e.g., "land" = aretz, "the land" = haaretz, "in the land" = bearetz). The final list included 560 Hebrew words. I calculated the relative frequency of each word (i.e., how often the word is used compared to the other 559 words), then standardized the values. The result was 560 numeric variables, each representing a sufficiently common and sufficiently relevant Hebrew word.

560 variables is too many to easily work with, so I used Principal Component Analysis to reduce it to a few manageable variables, each of which was a linear combination of the standardized relative frequencies of all 560 words. To understand what the principal components mean, I plotted chapters of books of the Bible with obvious/known genres: History (e.g., 1 & 2 Chronicles), Prophesy (e.g., Isaiah), and Wisdom Literature (e.g., Proverbs). Each dot on the graph represents a chapter where the genre of the book (though not necessarily of the chapter) is known.

The first two principal components do an excellent job of separating the books of different genres! The first (PC1) seems to indicate how historical vs. poetic it is. The lowest value (-14.8) is for 2 Chronicles 27, a very historical chapter detailing the reign of king Jotham. The highest value (4.5) is for Psalm 21, a very poetic song. PC2 measures another dimension that (at least in theory) is not related to how historical/poetic a book is. It does a great job of distinguishing between prophecy and wisdom literature. The big outlier among the prophetic books (red triangle on the left side of the blue cluster, at PC1=-9.3, PC2=0.9) happens to be Jeremiah 52, which is a very historical chapter despite being in a prophetic book.

PC1 and PC2 also were calculated for entire books and for chapters/books of unknown genres. Those can be plotted on the same graph to visualize how similar they are to the known genres. For example:

For a more quantitative genre classification, I built a Logistic Regression model using the first 6 principal components. The model estimates the probability that a writing belongs to one of the three broad genres, assuming those are the only three options. As an example, I applied it to each chapter of Genesis and plotted the output below:

According to the model, the first, 3rd, and 15th chapters are by far the least historical, which might disappoint some who interpret Genesis 1 as a scientific or historical narrative. The biggest outlier, however, is chapter 15, which the model thought was very likely prophetic. Indeed, Chapter 15 is about God's covenant with Abram and includes several prophecies about the future.

K-Means Clusters, Hebrew Bible

Classification into these broad genres is only the beginning. If other genres, writing styles, authors, topics, etc. can be identified, another model could easily be built to classify writings according to those, using the same principal components calculated here. If none of those things are known, Cluster analysis can be used to identify writings that have various features in common (see example on the right).

My plan (if I ever get enough free time) is to set up a web page where anyone can easily get the classification values for each chapter of each book. We may never get to a point where computers and algorithms can accurately interpret the Bible for us, but they certainly can be helpful.