All posts in The Polymaths
Science 2.0: Disrupting the publication-grant model of scientific research and introducing the network effects of distributed, open-source, collaboration
Do you think Maurice Wilkins did the right thing in showing Watson the x‐ ray diffraction image produced by Franklin? Why or why not?
It was ethically wrong, but it changed the world. Ultimately, it was Franklin’s intellectual property. It was her work and she had all the rights to its ownership. However, I have a greater concern than the ethical and property rights argument. To encourage participation and synergy across departments and universities, as Wilkins’ had hoped, one needs to fundamentally change the incentive system in science. Historically, science and scientists have depended on the publication-grant model to acquire funding and advance our understanding of the world. But why should publishers be the gatekeepers to drive innovation and invention? Why should scientists have to spend half the year writing a grant to acquire funding, when they could spend that time doing groundbreaking research? In the tech industry, GitHub.com and SourceForge.net have become the repositories for open-source innovation. Engineers, scientists, designers and business leaders collaborate on this site to create meaningful change. Projects can be created spontaneously to solve specific problems and attracts talent from across the world. Some of the world’s most powerful and useful software has been generated in an open-source manner (e.j. Mozilla Firefox, one of the world’s most common web browser, Linux, a free operating system, osCommerce, an e-commerce platform repackaged as Magento, and PBX / Twilio, a call routing system used by almost all toll free automated phone services). If the same system was available to scientists, imagine the possibilities. The incentive system would be obvious: contribute to the project’s overall knowledge (in the form of imaging, lines of code, experimental data, etc), your time investments will be accounted, and when the project finds investment capital, the contributors will be paid appropriately (a factor of time invested and importance of individual contributions). I read an interesting article in the WSJ about this idea (http://on.wsj.com/uHZmqs).
Venture capitalists get excited and fund Internet startups that are trying to crowdsource ice cream trucks and get them to right neighborhoods to find business. This stuff can get as much as $5 million in Series A funding. And it doesn’t change the world one bit. I look forward to the day scientists ditch their labs, join the DIY Biology Movement, get some 3D printers, a basic PCR machine, and start hacking some problems on their own. I have a feeling that much like computer programming became a decentralized activity with the advent of personal computers, synthetic biology, 3D printing and basic PCR machines will be the future of scientific research. Ultimately, individuals are far more capable of making transformative changes than bureaucratic organizations. Google was a couple of college kids on their thousand dollar laptops. Steve Jobs’ Apple Computers and Bill Gates’ Microsoft are the product of two college dropouts who built personal computers that everyone could use. The tools of innovation, of disruptive change are in everybody’s hands and they are not expensive. One just has to be courageous enough to give up a comfortable job and change the world.
More investors need to start thinking about creating technology accelerators to advance science and bring some major breakthroughs to market. There are some early entrants like Singularity University’s SynBio Launchpad (a Synthetic Biology Startup Accelerator – http://singularityu.org/synbio/), Peter Theil’s Founders Fund, John Doerr’s Kleiner Perkins and Caulfield, and Steve Jurvetson’s Draper Fisher Jurvetson that are following this model and will, in my humble and honest opinion, find great success.
In what ways did competition between the scientists help or hinder the progress to understanding the structure of DNA?
Competition definitely helped progress the discovery of the structure of DNA. This is a highly philosophical argument, but when you bring together the best and brightest minds in any field, give them one of the world’s biggest challenges, and make them compete against each other, you add the key ingredient to making massive amounts of progress. That ingredient is momentum. Case in point was the Human Genome Project and Celera Genomics competing to crack the order of the six billion base pairs that comprise the human genome. Competition is the key ingredient to get people to do a lot of work, very quickly. I realize this philosophy does not agree with the academia’s idealistic and utopian notions of sharing and collaboration, but it often works better than academic collaboration, because it incentivizes divergent thinking, speed of thought, ability to pivot and try something new. Competition is transformative; it is the intersection at which passion meets momentum. When you are competing against someone, you start to realize that you’re not the only one around the bend and that you’ve got to win the race before they do. If Watson and Crick collaborated with Franklin and Wilkins, their distinct methodology, their philosophy of the world, and the creative synergy between Watson and Crick would have been lost. Their free-spirited nature, their creative endeavors and interdisciplinary attitude towards learning and knowledge assimilation, would have been canceled out by the work ethic of Franklin and Wilkins. One needs to encourage divergent thinking rather than group think, because synergies between individuals who work well with each other creates an energy, a passion that will always outpace the often compromised output of group think. Companies like IDEO tap into this philosophy to continually bring innovative and game-changing products to market. It may be true that Franklin and Wilkins’ eagerly and painstakingly amassed the evidence for the DNA double helix, but their minds were not quick to assimilate the evidence and create meaningful information. They had all the data but they couldn’t imagine the structure, which is a key point that shouldn’t be overlooked. Franklin was a great scientist, but she was too rigid to really break out the world of data accounting and analysis. Perhaps, if she had been open to synergistic collaboration, she would have cracked the code of life before Watson and Crick. Undoubtedly, Franklin was a mastermind and a very deliberate, analytical thinker. But, she was uncomfortable with the notion of being wrong. The key to being great is to overcome your own ego, to encourage divergent thinking, establish a creative synergy with your closest collaborators, not be affected by dogma, believe in your own ideas and change the world.
Who do you think should get the credit for solving the structure of DNA? Provide some evidence to support your answer.
I think Watson and Crick should get the credit because ultimately they were able to synthesize and assimilate the raw data and create meaningful information, faster than Franklin and Wilkins. Again, Franklin was extremely close and she may have cracked the code before Watson and Crick if she had a better collaborator.
Undoubtedly, Franklin was a mastermind and a very deliberate, analytical thinker. But, she was uncomfortable with the notion of being wrong. When Crick invited Franklin to come look at their early prototype of the DNA spiral, she laughed at them. They had got it inside out. Clearly, Watson’s inertia to take notes during Franklin’s department research meeting had let him down. They had very stupidly, not accounted for the water content in the structure. She laughed at them and their silly attempt to build a model before they had any idea what the data meant. But, it is precisely this exploratory, experimental and creative attitude that would finally allow Watson and Crick to figure out the structure. Franklin was a great scientist, but she was too rigid to really break out the world of data accounting and analysis. Perhaps, if she had been open to synergistic collaboration, she would have cracked the code of life before Watson and Crick. However, given that she was woman in the 1950s world of all-male scientists, one can imagine why she did not have this opportunity. There was a part in the movie when Franklin’s lab assistant says that Wilkins’ once commented that instead of “trotting along” trying to map the structure of DNA using the diffraction pattern, “we should try other things” because we “are in a race” and this is “a big problem.” Clearly, Wilkins’ had the right ideas to make progress but was in the wrong environment to give them any momentum. It was really a toxic environment for science and for creative thinking.
When Watson shared the picture he had gotten from Wilkins, Francis was immediately able to dissect the structure from the diffraction pattern. Because Crick was an interdisciplinary thinker, he knew just by looking at the picture that the X meant that it was a helix. Fortunately, Chargaff had also shared that A and T, C and G are equal in concentration. They combined this knowledge and were able to imagine the structure much faster than Franklin. By combining multiple sources of knowledge, they established that the structure must a double helix composed of complimentary base pairs (A-T; G-C).
More on Ariel Garten: http://en.wikipedia.org/wiki/Ariel_Garten
For more reading:
A classical computer has a memory made up of bits, where each bit represents either a one or a zero. A quantum computer maintains a sequence of qubits. A single qubit can represent a one, a zero, or, crucially, any quantum superposition of these; moreover, a pair of qubits can be in any quantum superposition of 4 states, and three qubits in any superposition of 8. In general a quantum computer with n qubits can be in an arbitrary superposition of up to 2n different states simultaneously (this compares to a normal computer that can only be in one of these 2n states at any one time). A quantum computer operates by manipulating those qubits with a fixed sequence of quantum logic gates. The sequence of gates to be applied is called a quantum algorithm.
An example of an implementation of qubits for a quantum computer could start with the use of particles with two spin states: “down” and “up” (typically written and , or and ). But in fact any system possessing an observable quantity A which is conserved under time evolution and such that A has at least two discrete and sufficiently spaced consecutive eigenvalues, is a suitable candidate for implementing a qubit. This is true because any such system can be mapped onto an effective spin-1/2 system.
A quantum computer with a given number of qubits is fundamentally different than a classical computer composed of the same number of classical bits. For example, to represent the state of an n-qubit system on a classical computer would require the storage of 2n complex coefficients. Although this fact may seem to indicate that qubits can hold exponentially more information than their classical counterparts, care must be taken not to overlook the fact that the qubits are only in a probabilistic superposition of all of their states. This means that when the final state of the qubits are measured, they will only be found in one of the possible configurations they were in before measurement. However, it is incorrect to think of the qubits as only being in one particular state before measurement since the fact that they were in a superposition of states before the measurement was made directly affects the possible outcomes of the computation.
Qubits are made up of controlled particles and the means of control (e.g. devices that trap particles and switch them from one state to another).
For example: Consider first a classical computer that operates on a three-bit register. The state of the computer at any time is a probability distribution over the 23 = 8 different three-bit strings 000, 001, 010, 011, 100, 101, 110, 111. If it is a deterministic computer, then it is in exactly one of these states with probability 1. However, if it is a probabilistic computer, then there is a possibility of it being in any one of a number of different states. We can describe this probabilistic state by eight nonnegative numbers a,b,c,d,e,f,g,h (where a = probability computer is in state 000, b = probability computer is in state 001, etc.). There is a restriction that these probabilities sum to 1.
– Right now people episodically sequence their genome (say every year) to find new problems. Every year new discoveries are found as the number of genomes sequenced, grows.
BUT it would be better to store every person’s genome and as the intel inside grows, identify problems in individual genomes and if problems are found, email those individuals to alert them to their genomic defects.
The upside to having an aggregate global genomic database, is that random error (genetic mutations) are factored out. Thus, one can use the global genomic database as the benchmark to compare individual genomes.
Intel Inside on Global Human Genome
– Comparing your genome against the global database of everyone genomic data, to find problems in your genome
Unmet need: Relational databases for genomics?
Clinical Genomics and Research Genomics:
– Clinical genomics will need 10^-7 accuracy, or 600 errors out of 6 billion. Or less than one per gene. Right now it is at 10^-5.
– Research genomics can make do with more errors because researchers can merge data from other datasources to statistically average out the errors
Intel Inside on Genomics as Platform as Service:
– Imagine a PaaS model where companies build solutions (listed below) using the global genomics database
Intel Inside on Genomics as Software as Service:
– Imagine a SaaS model where users log in to see trends in the personal genome and use apps provided by PaaS companies to better understand the data, visualize it and translate it into actionable information that can be shared with physicians.
Storage and data processing problem:
• Storing petabytes of data is huge hardware problem. The disk makers are not able to meet the storage demand of genomics and chip manufacturers are not able to sequence genomes in under 15 minutes. Companies need to innovate to meet both the storage demand and data processing demand, so the genomes can be sequenced quickly with a 10^-7 error rate under 15 minutes a pop.
• Being able to index all of it —- Google of genomics?
Reads vs. variants: GitHub, Collaboration and Open Innovation Model
– Currently researchers share terabytes of genomics data and ship it around in harddrives. To implement a cloud service, one needs to invent a system like GitHub that uploads one read origin and then subsequently changes/differences to the origin. Furthermore, research should be conducted by building apps on top of the genomic data to analyze it. An open source community can flourish if the research community adopts the open innovation model, rather than the publication-grant model.
RSA security – EMC owns it
VMware – EMC owns it – virtualization for cloud
– No company has come up with data compression software
– No JPG, MPEG, MP4 style compression which is status quo in PC industry
– Right now scientists are mailing hard disk drives. Noobs.
– Basespace from Illumina is basically Dropbox for Genomics. Collaboration and file sharing for genomics.
User Experience, Data Visualization:
• With all this data, UX designers need to reduce it to what humans can understand. E.x. Flight cockpits show a horizontal line to show if you are going up or down, so the pilot can maintain altitude. The same thing needs to happen with genomics. Are you navigating the right area – navigating scientists to right area in the code base in an unmet need?
• Any sufficiently advanced technology is indistinguishable from magic. GPS reduces its triangulation technology to simple, turn by turn graphical user experience.
– Patterns, measurable trends in genomes from a particular race, age group, family tree etc
This is a small excerpt from Seed Magazine’s article on Fernando Esponda’s work on the Negative Database, designed to work much like the human immune system.
Information puts Fernando Esponda in a negative state of mind. Which is exactly why he’s poised to overturn conventional ideas in information science. His innovative research began as he was working toward his doctorate in computer science and became interested in the human immune system. “What caught my eye was that the information being used by the immune system was a negative kind of information,” he says. That is, the immune system doesn’t have a record of every possible pathogen that could invade the body. Instead, it learns what the body itself looks like and knows to go on the offensive when it encounters anything that doesn’t match its definition of “self.” “Can we do the same thing with data?” Esponda began to wonder. “Can we take a database, and instead of storing everything that’s in it, can we store everything that is not in it?”
The idea sounded preposterous, and Esponda’s colleagues told him as much. But in his dissertation, Esponda demonstrated that a negative database could be created, stored, and manipulated effectively and efficiently—setting the stage for a revolution in information science. Now Esponda is working to determine when negative databases might possess significant advantages over the status quo. One obvious application, he says, involves improving data security. Though it is sometimes possible to work backward from a negative database to its positive complement, negative databases are still tougher to crack, especially if they are divided into several parts and stored in separate locations.
Esponda, who works at the Mexico Autonomous Institute of Technology (ITAM), is exploring scenarios in which it could be useful to collect negative data about behaviors that people might find sensitive, and about which they might lie. For instance, he says, researchers can design a survey question asking women how many abortions they’ve had, giving them five answers to choose from. But rather than asking a woman to check the box that applies to her, the survey could ask her to check one of the boxes that does not apply. Reverse surveys reveal only a little about each subject but a lot about a population, and they can accurately estimate how common a behavior is without anyone having to admit to it. “Every choice you make leaves a choice of inaction, of what you did not do,” Esponda says. “I’m trying to show exactly what can be learned from this inaction. How can we gain insight from what is not there?”