When I tell people I do “bioinformatics,” the response is usually “bio-what?” And honestly, bioinformatics is hard to define even for people WITHIN the field. In this post I will describe what I see as the two main threads of bioinformatics.
First of all, there’s the issue of naming. To some people, “Computational Biology” is separate from “Bioinformatics,” but two people rarely agree on exactly what constitutes one or the other. Secondly, our PhD program is named “Bioinformatics and Systems Biology”, and according to my friend on the systems biology side who worked in a Harvard lab during undergrad before coming to UC-San Diego, there’s a huge difference between “East Coast Systems Biology” and “West Coast Systems Biology,” even though I can’t tell.
Secondly, there’s a huge range of topics that consist of bioinformatics in one way or another. The way I see it, there’s two main divisions: “Classic Bioinformatics” and “Applied Bioinformatics.” These should be taken with a heavy grain of salt, as I am a newcomer to the Bioinformatics field. These are my opinions on the current state, not absolute truth. Additionally, these two “halves” are symbiotic, and need each other to sustain life.
This is where it all started. It used to be that when people thought of bioinformatics, they thought of comparing sequences. A few years ago I read a blog post that stated “All of bioinformatics is essentially
strcmp [string compare],” which made my blood boil because the subtleties of comparing these strings requires sophisticated algorithms to account for gaps, substitutions, and repetitive regions. Most of the tools below are analyzing strings or
The focus here is to make sense and organize what we have of biological data into organized data structures.
These are the algorithms and tools that we learn in our first-year graduate school classes, like:
I see this as Computer science applied to biology.
This is where we are now. This is closer to what I do on a daily basis. Honestly, I don’t build any of the algorithms above, but I do need to understand them to know where my data is coming from.
Honestly, the skills I learned in my classes was a good basis for the broad field of biology, but I barely use them on a day-to-day basis. Mostly I
The focus here is to attribute biological function to data.
I see this as Data science applied to biology. These require the output of the “Classic Bioinformatics” tools, and use statistics, machine learning, and other applied tools to extract information.
What’s interesting is that this division is especially evident in the awardees of the International Society of Computational Biology (ISCB), where the Overton awards to early-career scientists tend to be more “Applied,” and the Senior Scientist tend to be more “Classic.”
As I am human, I certainly may have missed a paper or two that pushed one person into one of the other categories, and I am happy to correct any errors.