Skip to main content

Connectome as a Book


Your Connectome is a map of your brain.  Every neuron, every synapse.

I am only a few pages into Connectome, but was intrigued by a sentence: "Human DNA....has three billion letters....would be a million pages long if printed as a book."  The companion question, "How many pages for the Connectome?" might be answered later in the book, but I thought I would take a shot at it here.

Here is the punchline: Your Connectome book is 6.7 million times longer than your DNA book.

That human DNA is about a million pages is not too surprising, although it probably is not optimized. According to quora there are between 1500 and 1800 letters per page.  I am going to use round numbers, namely 2000.  Then, the 3x10^9 DNA letters would actually be 1.5 million pages.  But this is very wasteful.  Even using just ASCII we can encode four DNA letters per character, so the book should really only be about 400K pages.  And, this book is much more interesting; instead of endless GATC's, you get a full 256 character set to work with. Further compression is definitely possible, and in fact, up to 99% compression has been shown. This is largely due to repetitive structures in DNA that can be encoded efficiently.  So, we can write the DNA book with only 15,000 pages.  Robert Jordan's series The Wheel of Time is 11,000 pages and probably a little bit more readable.)

Now, the Connectome, according to Wikipedia, has 10^10 neurons and 10^14 synapses.  So, on average, there are 10,000 synapses per neuron.  If we imagine our Connectome book to be (somewhat) efficiently coded, we could do the following.  With 10^10 neurons we need 34 bits to encode the "neuron address".  I am going to assume that neurons have a "local space" (i.e, are connected to other neurons that are physically close by), and can be addressed relative to themselves with only 32 bits (if a connection is out of local space, we can use an escape sequence and a full 34 bit address).  This 32 bits gives us a nice round 4 characters per neuron.  Our encoding is then the following: <marker><neuron number><neuron for synapse 1>....<neuron for synapse 10,000>.

But this is still highly wasteful.  Let's sort the connections by neuron address and then use a differential encoding. With 10^10 neurons, and 10^4 connections per neuron, the average "distance" between synapses will be just 10^6, or a measly million. This takes only 20 bits to encode, or 2.5 characters.  Let's round down to 2 characters per synapse, assuming more compression is possible.  (I am assuming that there is no repetitive structure in the Connectome, so while some more compression is possible, it is probably not going to be much).  Now we have 2 * 10^14 / 2000 = 10^11 pages in our Connectome book.  That is a hundred billion pages, or approximately 6.7 million times as many pages as the DNA book.

Amazon can deliver one, but might struggle with the other.


Of course, I have used the assumption that, just like folding is ignored in the DNA Book, the XYZ coordinates of synapses is not required for the Connectome, just the connection graph.









Comments

Popular posts from this blog

The Fourth R.

Reading, wRiting, aRithmetic, and algoRithms.  My wife and I were just brainstorming about this: how coding should be the next "basic" skill.  Of course, someone was ahead of us and posted this .  It is awesome to see Mozilla Hackasaurus referenced in this article.  It is a small world. In the early days of the printing press, scholars wrote the books; the press was simply used for production (see this article ).  As time went on, "average" people became familiar with the medium, and used it for their own messages.  We are at just that point with the Web.  Software Engineers write the code, and the Web distributes it.   Software Engineers are the algoRithm scholars of today.  They won't be for long.  Soon algoRithms will be taught starting in elementary school, along with the other three R's.

Timed math tests

You have 3.2 seconds to figure out the problem below. Alan knows 90% of the concepts behind the math test, and can do those 90% very quickly.  He always gets 90% on timed math tests. Bob knows 100% of the concepts, but is a slow worker.  In the timed math test, he gets 75%, but, if given an extra 10 minutes, would get 100%. Alan graduates with an A; Bob with a C. You are building a bridge. Who would you hire? Seems like everyone from Gates to Zuckerberg has problems with how education is carried out today.  I wish I had some of their clout and could help to change the system.