Skip to main content

Connectome as a Book


Your Connectome is a map of your brain.  Every neuron, every synapse.

I am only a few pages into Connectome, but was intrigued by a sentence: "Human DNA....has three billion letters....would be a million pages long if printed as a book."  The companion question, "How many pages for the Connectome?" might be answered later in the book, but I thought I would take a shot at it here.

Here is the punchline: Your Connectome book is 6.7 million times longer than your DNA book.

That human DNA is about a million pages is not too surprising, although it probably is not optimized. According to quora there are between 1500 and 1800 letters per page.  I am going to use round numbers, namely 2000.  Then, the 3x10^9 DNA letters would actually be 1.5 million pages.  But this is very wasteful.  Even using just ASCII we can encode four DNA letters per character, so the book should really only be about 400K pages.  And, this book is much more interesting; instead of endless GATC's, you get a full 256 character set to work with. Further compression is definitely possible, and in fact, up to 99% compression has been shown. This is largely due to repetitive structures in DNA that can be encoded efficiently.  So, we can write the DNA book with only 15,000 pages.  Robert Jordan's series The Wheel of Time is 11,000 pages and probably a little bit more readable.)

Now, the Connectome, according to Wikipedia, has 10^10 neurons and 10^14 synapses.  So, on average, there are 10,000 synapses per neuron.  If we imagine our Connectome book to be (somewhat) efficiently coded, we could do the following.  With 10^10 neurons we need 34 bits to encode the "neuron address".  I am going to assume that neurons have a "local space" (i.e, are connected to other neurons that are physically close by), and can be addressed relative to themselves with only 32 bits (if a connection is out of local space, we can use an escape sequence and a full 34 bit address).  This 32 bits gives us a nice round 4 characters per neuron.  Our encoding is then the following: <marker><neuron number><neuron for synapse 1>....<neuron for synapse 10,000>.

But this is still highly wasteful.  Let's sort the connections by neuron address and then use a differential encoding. With 10^10 neurons, and 10^4 connections per neuron, the average "distance" between synapses will be just 10^6, or a measly million. This takes only 20 bits to encode, or 2.5 characters.  Let's round down to 2 characters per synapse, assuming more compression is possible.  (I am assuming that there is no repetitive structure in the Connectome, so while some more compression is possible, it is probably not going to be much).  Now we have 2 * 10^14 / 2000 = 10^11 pages in our Connectome book.  That is a hundred billion pages, or approximately 6.7 million times as many pages as the DNA book.

Amazon can deliver one, but might struggle with the other.


Of course, I have used the assumption that, just like folding is ignored in the DNA Book, the XYZ coordinates of synapses is not required for the Connectome, just the connection graph.









Comments

Popular posts from this blog

Decentralization, Democracy, and Well-Being

Those of us raised in Democratic societies take it for granted that those societies provide better well-being (for common individuals) than other forms of governance. At the heart of democracy is personal freedom and autonomy, backed by the rule of law. We also take for granted the interplay of decentralized versus centralized authority. Decentralization can mean many things, but here we refer to it in terms of power, authority, and decision making. The more authority individuals have, the more decentralized the power system in which they are operating.  Almost by definition the more democratic a system, the more decentralized it is, with the caveat that some agreed upon axioms exist, such as the rule of law and its enforcement. Of course, authority can be too decentralized leading to "every man for themselves", so we put limits on decentralization through that same rule of law. With the advent of decentralizing technologies , which make possible more decentraliz...

Timed math tests

You have 3.2 seconds to figure out the problem below. Alan knows 90% of the concepts behind the math test, and can do those 90% very quickly.  He always gets 90% on timed math tests. Bob knows 100% of the concepts, but is a slow worker.  In the timed math test, he gets 75%, but, if given an extra 10 minutes, would get 100%. Alan graduates with an A; Bob with a C. You are building a bridge. Who would you hire? Seems like everyone from Gates to Zuckerberg has problems with how education is carried out today.  I wish I had some of their clout and could help to change the system.