Computational Biology Tools from Microsoft corporation

The Tools

  • PhyloD
    • Pathogens live and reproduce inside the human host, whose immune system continually tries to rid the body of these pathogens. This leads to a tug-of-war between the pathogen and the human host, where the pathogen tries to adapt so as to “escape” the immune system, while the immune system learns to recognize and eliminate new foreign pathogens. A set of key players for the immune system are the HLA proteins, each of which can recognize specific short fragments of foreign (e.g. HIV) proteins, called epitopes, in infected cells and then alert the immune system to their presence. For rapidly evolving pathogens like HIV, a key defense mechanism is to evolve mutations that prevent the HLA proteins from recognizing the viral DNA. This evolution takes place anew in each patient, as each patient has a different set of HLA proteins that recognize different epitopes. PhyloD is a statistical tool that can identify HIV mutations that defeat the function of the HLA proteins in certain patients, thereby allowing the virus to escape elimination by the immune system. By applying this tool to large studies of infected patients, researchers are now able to start decoding the complex rules that govern the HIV mutations, in the hope of one day creating a vaccine to which the virus is unable to develop resistance.
  • Epitope Prediction
    • This tool computes the probability that a given kmer is a T-cell epitope restricted to a given HLA allele. The tool can scan for 8, 9, 10, and 11mer epitopes and over all common HLA alleles.
  • HLA Completion
    • HLA sequence typing sometimes yields uncertain results. For example, an allele may be identified as A6801/6802 or simply A02. This tool takes as input HLA typing data (loci A,B,C) and probabilistically resolves the typing ambiguities (i.e., probabilistically “completes” the data to 4-digit resolution).
  • HLA Assignment
    • One way to find epitopes is to do lab studies such as ELISPOT. One problem with this approach is that, if you see a reaction in a patient, you don’t know which of the patient’s HLA genes is responsible for the reaction. This tool takes lab data from a series of patients and determines (probabilistically) which HLA genes are responsible for the reaction.
  • Create Epitome
    • This tool takes, as input, a weighted list of amino acid sequences. It creates epitomes of all lengths.
  • False Discovery Rate
    • Estimate the false discovery rate for 2X2 contingency tables, based on Fisher’s statistics
source : Codeplex

Scopes of Bioinformatics

India is set  to take the global leadership in genome analysis. India has a large populations that are valuable in providing information about disease predisposition and susceptibility, which in turn will help in drug discovery and other related tasks.

However,  India lacks the records of clinical information about the patients, sequence data without clinical information will have little meaning. And hence partnership with clinicians is essential. The real money is in discovering new drugs for ourselves and not in supplying genetic information and data to the foreign companies, who would then use this information to discover new molecules .

The genomic data provides information about the sequences, but it doesn’t give information about the functions. It is still not possible to predict the actual 3-D structure of proteins. This is a key area of work as tools to predict correct folding patterns of proteins will help drug design research substantially.

Looking at this biotech and pharma companies need tremendous software support. Software expertise is required to write algorithms, develop software for existing algorithms, manage databases, and in final process of drug discovery.

Some major opportunity areas for IT companies include:

*  Improving utility& content of  databases

*  Tools for data generation, capture, and annotation

*  For comprehensive functional studies tools and databases

*  Representing and analyzing sequence similarity and variation

* Creating mechanisms to support effective approaches for producing robust, software that can be widely shared.

Indian IT companies have a great business opportunity to offer complete database solutions to major pharmaceutical and genome-based biotech companies in the world.

Pure cost benefits for the biotech companies will definitely drive the bioinformatics industry in the country. The biotech industry in 2000 has spent an estimated 36 percent on R & D. Success for many will mean a drastic reduction in R&D costs. Thus biotech companies will be forced to outsource software rather than developing propriety software like in the past. Since the cost of programs for handling this data is extremely high in the west, Indian IT companies have a great business opportunity to offer complete database solutions to major pharmaceutical and genome-based biotech companies in the world.

The IT industry can also focus more on genomic’s through different levels of participation areas such as hardware, database product and packages, implementation and customization of software, and functionality enhancement of database.

Abraham Thomas, managing director, IBM India Ltd, says, “the alignment of a vast pool of scientific talent, a world-class IT industry, a vigorous generic pharmaceutical sector and government initiatives in establishment of public sector infrastructure and research labs are positioning India to emerge as a significant participant on the global biotech map.”

With an objective to help and rise bioinformatics sector to the world map the Bioinformatics Society of India (Inbios) has been working since August 2001. The Inbios already has over 270 members in a short span of one and half years. It has become a common informal platform for the younger generation to learn and contribute to this sun rising field in India.

Problems in the sector

The major issue for India is its transition from a recognized global leader in software development to areas of real strength upon which it can capitalize in the biosciences. The identifiable areas are in computation biology and bioinformatics, where a substantial level of development skills are required to develop custom applications to knot together and integrate disparate databases (usually from several global locations), simulations, molecular images, docking programs etc.

The industry people, meanwhile, say that the mushrooming of Bioinformatics institutes is creating a problem of finding talented and trained individuals in this industry. While many of them has superficial knowledge and a certificate, India lacks true professionals in this area.

Most people, who opt for bioinformatics are from the life sciences areas that do not have exposure to IT side of bioinformatics, which is very important. Another issue is that some companies face shortage of funds and infrastructure. The turn around time for an average biotech industry to brakemen would be around three to five years.

Most of the venture capitals and other sources of funding would not be very supportive, especially if the company is not part of a larger group venture. It would help if the government would take an active role in building infrastructure and funding small and medium entrepreneurs.

source :

Scenario of Bioinformatics in India for Entry level student ?

1. Where is  the  opportunities in field of Bioinformatics in commercial acumen in India?

According to me, there is opportunities of Bioinformatics in India are  in research institutes and projects undergoing government funding. In India there is not much of opportunities in Pharmaceutical companies and there R&d section.

2. Does Research insti’s and universities favoring Bioinformatician for there specialization or just taking any one who just have passed NET/GATE or other eligibility test for PhD?

There are lots of  Research institutes and Universities employing or enrolling candidates from different streams of life sciences like Zoology, botany, genomics , Biotechnology and other related streams of life Sciences for PhD and jobs in  Bioinformatics irrespective as they have opted for Bioinformatics as last chance to get employment in research field. There are lots of places where  work in Bioinformatics is going on and they don’t have even single  Bachelor or Master’s student of Bioinformatics to do work or to pursue Further studies. As a bioinformatics student I really feel that passing Net/Gate or other eligibility test is a task for Bioinformatics or pharmacoinformatics student as compared to other students pursuing studies in other disciple of life sciences as he / she is having 4-6 programming languages, Mathematics , Statistics and Bioinformatics, and molecular Biology in academic curriculum in spite of  having Chemistry,Physics,Zoology and Botany.

Future aspects of Bioinformatics :

As discussed in earlier post about CUDA (source:  http:// ). Bioinformatics is emerging with a new concept of accelerating its application ultimately leading to genome analysis of millions of base pairs in a few seconds on your Notepad or Personal computer.So There are lots of opportunities coming our way.

Related information and Discussions are welcome. send it to

Accelerate Bioinformatics Applications


source : http://

Compute Unified Device Architecture is a parallel computing architecture developed by NVIDIA. CUDA is the computing engine in NVIDIA graphics processing units or GPUs that is accessible to software developers through industry standard programming languages. Programmers use ‘C for CUDA’ (C with NVIDIA extensions), compiled through a PathScale Open64 C compiler, to code algorithms for execution on the GPU. CUDA architecture supports a range of computational interfaces including OpenCL and DirectX Compute. Third party wrappers are also available for Python, Fortran, Java and Matlab.

Sequencing and protein docking

Sequencing and protein docking are very compute-intensive tasks that see a large performance benefit by using a CUDA-enabled GPU. There is quite a bit of ongoing work on using GPUs for a range of bio-informatics and life sciences codes.

Bio Informatics Life Sciences Hmmer Bio Informatics Life Sciences DNA
Accelerating HMMER using GPUsScalable Informatics MUMmerGPU: High-through DNA sequence

Molecular dynamics

Molecular dynamics applications are extremely amenable to the massively parallel architecture of NVIDIA’s GPUs. In the charts below, we highlight work done on VMD and also molecular dynamics software packages such as NAMD and HOOMD.

Molecular Dynamics Ion Placement VMD Molecular Dynamics Lennard Jones
Ion Placement in VMDStone, Phillips, Hardy, Schulten HOOMD on 1 NVIDIA GPUoutperforms 16 CPU cores running LAMMPs

Anderson, Lorenz, Travesset