Suitable introductory book on Bioinformatics for a computer scientist?

Suitable introductory book on Bioinformatics for a computer scientist?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Do you have any suggestions on a suitable introductory text on bioinformatics for a computer scientist? Any recommendations with pros and cons of different books would be appreciated. I'm mainly looking for a brief introduction since I'm a novice to this field.

I found some books like

  1. Systems Biology by Edda Klipp,
  2. Introduction to Bioinformatics by Arthur Lesk
  3. Bioinformatics For Dummies
  4. Bioinformatics and Functional Genomics by Jonathan Pevsner
  5. An Introduction to Bioinformatics Algorithms by Neil C. Jones and Pavel A. Pevzner

I would like to develop software related to sequencing applications, so any suggestions should ideally aid in this goal.

Systems biology and bioinformatics are quite diverse subjects, but here are some options:

  • Biological Sequence Analysis by Durbin et al.: the classic bioinformatics text for people with a CS background. Does not deal with networks.
  • An Introduction to Systems Biology by Uri Alon: A physicist's view of systems biology, focusing on design principles and networks. Interesting concepts and some math (no algorithms). Doesn't really deal with methodology.
  • Physical Biology of the Cell by Phillips et al.: Nice introduction to biophysics and quantitative biology (no algorithms). Many parts are relevant for systems biology.
  • Any diverse Machine Learning text will be relevant and many have examples from biology/bioinformatics.
  • Probabilistic Graphical Models are also quite popular in bioinformatics (e.g. Hidden Markov Models, Expectation-Maximization, Bayesian Networks). Probabilistic Graphical Models by Koller et al. is a very comprehensive CS book on the subject.

Another suggestion is to look for primers/reviews in relevant journals. This is important because the field is new, so once a new technology is developed, it takes time until the appropriate computational methodology is developed enough to make it into a text book. Also, they can point you to relevant research papers. For example, PLoS Computational Biology has an Education Collection which features a large variety of such reviews.

I second these two I used both and liked them overall:

  • Bioinformatics and Functional Genomics by Jonathan Pevsner
  • An Introduction to Bioinformatics Algorithms by Neil C. Jones and Pavel A. Pevzner

For my bioinformatics class a few years ago, we used these:

  • Fundamental Concepts of Bioinformatics - Krane and Raymer

Was excellent as an intro text. Explained a lot of the biological concepts while it was teaching you the methods. Useful either for biologists or computer scientists bridging into bioinformatics.

  • Statistical Methods in Bioinformatics - Ewens and Grant

The go to book for more comprehensive coverage of algorithms and applications. Did a good job covering the details of a method, like dynamic programming, from start to finish. Much more computer science oriented.

For a really brief introduction, I recommend the 37 page paper

Bioinformatics - An Introduction for Computer Scientists by Jacques Cohen.

It is a well written text, assuming some knowledge about algorithms, computing and maths, but not about biology. It helped me a lot getting comfortable with the field.

Afterwards, I'd stick to Durbin's "Biological Sequnce Analysis" (as already mentioned by Bitwise) as a book that provides you with the theoretical foundations of the models used in bioinformatics. It focuses on Sequence processing, which is what you want as far as I understood your question. It is also quite accessible from a computer scientist's or mathematician's point of view.

7 Best Computational Biology Books

The development of Next Generation Sequencing and microarray has led to massive data production in the field of biology. The enormous data requires thorough analysis and speculations. For these tasks, the field of computational biology came into existence.

This study field has lots of unsolved mysteries. The study of human gene and genomes is wide. Lots of undiscovered domains can be identified with the help of best Computational Biology textbooks. Reading can help the computational biologists to decipher the code of life.

We have prepared a list of the Best Computational Biology books for learners of every grade. These guides are unique and well-written that covers almost every potential topic for a good biological research. Keep on reading to find more about them.

1 Answer 1

this is a very broad question so am going to answer it with decent lists found across the internet most of which are sorted by some criteria (eg bestsellers, top review counts, etc). also there are so many bioinformatics books now and one strategy might be to simply go with your favorite publisher. there are some that come up often in CS areas eg O'Reilly. also, even the "x for Dummies" book series has a bioinformatics title which might be of interest to those who like that publisher. another option is to narrow it down by your algorithmic angle eg there is one based on Python and there are others that emphasize other statistical packages, etc.

as for your tricky criteria of "inspirational" it may make sense to look for biographies of scientists in the field or stories about successful startups, and if the student is into CS, then programming exercises might verge on "inspirational".

Table of Contents

The Data: Storage and Retrieval
Basic Principles
The Data
Data Quality
Data Representation
Genome Sequence Analysis
Basic Concepts
Genome Sequencing
Finding the Genes
Statistical Methods to Search for Genes
Comparative Genomics
A Virtual Window on Genomes: The World Wide Web
Protein Evolution
Basic Concepts
Molecular Evolution
How to Align Two Similar Sequences
Similarity Matrices
Penalties for Insertions and Deletions
The Alignment Algorithm
Multiple Alignments
Phylogenetic Trees
Similarity Searches in Databases
Basic Principles
The Methods
Amino Acid Sequence Analysis
Basic Principles
Search for Sequence Patterns
Feature Extraction
Secondary Structure: Part One
Prediction of the Three-Dimensional Structure of a Protein
Basic Principles
The CASP Experiment
Secondary Structure Prediction: Part Two
Long-Range Contact Prediction
Predicting Molecular Complexes: Docking Methods
Homology Modeling
Basic Principles
The Steps of Comparative Modeling
Accuracy of Homology Models
Manual versus Automatic Models
Practical Notes
Summing Up
Fold Recognition Methods
Basic Principles
Profile-Based Methods
Threading Methods
The Fold Library
How Well Do These Methods Work?
New Fold Modeling
Basic Principles
Estimating the Energy of a Protein Conformation
Energy Minimization
Molecular Dynamics
The “Omics” Universe
Basic Principles
Structural Genomics
But This Is Not All
Useful Web Sites
A Glossary, References, and Problems appear in each chapter.

Bioinformatics Books List

Pierre Baldi and Sÿren Brunak, Bioinformatics The Machine Learning Approach This book describes key machine learning approaches to molecular biology, including neural networks and hidden markov models. It is intended for two audiences: computer scientists / mathematicians and molecular biologists.

Pavel Pevzner, Computational Molecular Biology In one of the first major texts in the emerging field of computational molecular biology, Pavel Pevzner covers a broad range of algorithmic and combinatorial topics and shows how they are connected to molecular biology and to biotechnology. The book has a substantial "computational biology without formulas" component that presents the biological and computational ideas in a relatively simple manner. This makes the material accessible to computer scientists without biological training, as well as to biologists with limited background in computer science.

M.J. Bishop and C.J. Rawlings (editors), DNA and Protein Sequence Analysis---A Practical Approach IRL Press at Oxford University Press, 1997 ISBN 0 19 963464 5 (Hbk) ISBN 0 19 963464 7 (Pbk) An excellent introduction to Internet resources for molecular biologists. Unique coverage of issues relating to analysis of genomic sequence data. Provides an overview of the science underlying modern bioinformatics tools. Deals explicitly with the issues of interpreting results from computer analysis of DNA and protein sequence - not just how to run the programs.

R. Durbin and S. Eddy and A. Krogh and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press. A tutorial introduction to the use of hidden Markov models, stochastic context free grammars and other probabilistic models for sequence analysis problems in computational molecular biology.

W. Ewens and G. Grant, Statistical Methods in Bioinformatics Springer-Verlag NY This book grew out of a need to teach bioinformatics to graduate students at the University of Pennsylvania. At the same time however, it is organized to appeal to a wider audience. In particular it should appeal to any biologist or computer scientist who wants to know more about the statistical methods of the field, as well as to a trained statistician who wishes to become involved in bioinformatics. The earlier chapters introduce the concepts of probability and statistics at an elementary level, and will be accessible to students who have only had introductory calculus and linear algebra.

S.L. Salzberg, D.B. Searls, and S. Kasif (eds.), Computational Methods in Molecular Biology Elsevier Science B.V., Amsterdam, 1998. This book describes a range of different computational approaches to gene finding, protein structure prediction, and other sequence analysis problems. Tutorial material is included to make topics such as hidden Markov models accessible to the reader with a background in biology. Other tutorials are included to introduce biological concepts to readers whose background is computer science or something else besides molecular biology.

J. Setubal and J. Meidanis, Intro. to Computational Molecular Biology, PWS Publishing Co., 1997.

M.S. Waterman, Introduction to Computational Biology This book is intended to introduce someone who has advanced mathematical skills to the subject of biological data and problems.
C.H. Wu, Neural Networks and Genome Informatics This book is a comprehensive reference in the field of neural networks and genome informatics. The tutorial of neural network foundations introduces basic neural network technology and terminology. This is followed by an in-depth discussion of special system designs for building neural networks for genome informatics, and broad reviews and evaluations of current state-of-the-art methods in the field. This book concludes with a description of open research problems and future research directions.

Proceedings of Intelligent Systems for Molecular Biology 1997 375 pages recounting the presentations and events of this critical conference which was held in Greece in 1997.

Michael J. E. Sternberg, editor, Protein Structure Prediction---A Practical Approach Oxford University Press, London, 1996. A very good introductory tutorial for molecular biologists. Covers Sequence databases, alignment and secondary & tertiary structure prediction.

Sequence Analysis Primer Computerized sequence analysis is an integral part of biotechnological research, yet many biologists have received no formal training in this important technology. Sequence Analysis Primer offers the necessary background to enter this exploding field and helps more seasoned researchers to fine-tune their approach.

BioInform also has a large list of computational biology books, with links to ordering sites.

While ISCB does provide links to conferences, events, and other news items that may be of use to ISCB members and bioinformaticians at large, ISCB has no control over non-ISCB resources, and is not responsible for the content provided by outside sources. Such listings are not meant as an endorsement by ISCB.


Graduate students and post-baccalaureates serve as the laboratory instructors for BIOL 102. With each lab section accommodating up to 24 students, as many as 15 instructors may be on staff at any one time. With a goal of uniformity of rigor across the sections, the module developers, rather than the instructors, wrote both the postmodule and examination questions and provided comprehensive grading policies to standardize the scoring of the students’ responses on these assessment tools.

Assessment following the first implementation of the module in spring 2009 (n = 20) was encouraging 5 therefore, the scope of the assessment was expanded in 2010, to include sections taught by four different instructors (n = 78). The students fared very well on both assessment tools: an average of 92% on the questions that were administered during the lab period within minutes of completion of the module (“Post_Module”), and 88% on an examination question measuring understanding and retention 1 week later (“Week_later”), scoring equivalently on both assignments (Wilcoxon test, P = 0.34 Figure 1). Thus, students as a cohort seemed to retain the information they learned from performing the module and apply their newly acquired knowledge 1 week after the bioinformatics module was completed. Individually, however, students’ grasp of bioinformatics content weakened significantly after 1 week (paired Wilcoxon test, P = 0.0074). Although expected, the diminished retention suggests room for future enhancement and improvement in module design, implementation, or both.

Student performance on module assessment tools. Students’ percentile scores (n = 78) on the postmodule short-answer questions (“Post_Module”) are shown as triangles, and their scores on a single long-answer exam question 1 week later (“Week_Later”) as dots. Also plotted are the mean (star) and a range of two standard errors from the mean for each assessment tool. See text for discussion of this result.

Student performance on module assessment tools. Students’ percentile scores (n = 78) on the postmodule short-answer questions (“Post_Module”) are shown as triangles, and their scores on a single long-answer exam question 1 week later (“Week_Later”) as dots. Also plotted are the mean (star) and a range of two standard errors from the mean for each assessment tool. See text for discussion of this result.

Statistical Methods in Bioinformatics

Advances in computers and biotechnology have had a profound impact on biomedical research, and as a result complex data sets can now be generated to address extremely complex biological questions. Correspondingly, advances in the statistical methods necessary to analyze such data are following closely behind the advances in data generation methods. The statistical methods required by bioinformatics present many new and difficult problems for the research community.

This book provides an introduction to some of these new methods. The main biological topics treated include sequence analysis, BLAST, microarray analysis, gene finding, and the analysis of evolutionary processes. The main statistical techniques covered include hypothesis testing and estimation, Poisson processes, Markov models and Hidden Markov models, and multiple testing methods.

The second edition features new chapters on microarray analysis and on statistical inference, including a discussion of ANOVA, and discussions of the statistical theory of motifs and methods based on the hypergeometric distribution. Much material has been clarified and reorganized.

The book is written so as to appeal to biologists and computer scientists who wish to know more about the statistical methods of the field, as well as to trained statisticians who wish to become involved with bioinformatics. The earlier chapters introduce the concepts of probability and statistics at an elementary level, but with an emphasis on material relevant to later chapters and often not covered in standard introductory texts. Later chapters should be immediately accessible to the trained statistician. Sufficient mathematical background consists of introductory courses in calculus and linear algebra. The basic biological concepts that are used are explained, or can be understood from the context, and standard mathematical concepts are summarized in an Appendix. Problems are provided at the end of each chapter allowing the reader to develop aspects of the theory outlined in the main text.

Warren J. Ewens holds the Christopher H. Brown Distinguished Professorship at the University of Pennsylvania. He is the author of two books, Population Genetics and Mathematical Population Genetics. He is a senior editor of Annals of Human Genetics and has served on the editorial boards of Theoretical Population Biology, GENETICS, Proceedings of the Royal Society B and SIAM Journal in Mathematical Biology. He is a fellow of the Royal Society and the Australian Academy of Science.

Gregory R. Grant is a senior bioinformatics researcher in the University of Pennsylvania Computational Biology and Informatics Laboratory. He obtained his Ph.D. in number theory from the University of Maryland in 1995 and his Masters in Computer Science from the University of Pennsylvania in 1999.

Comments on the first edition:

"This book would be an ideal text for a postgraduate course…[and] is equally well suited to individual study…. I would recommend the book highly." ( Biometric s)

"Ewens and Grant have given us a very welcome introduction to what is behind those pretty [graphical user] interfaces." ( Naturwissenschaften )

"The authors do an excellent job of presenting the essence of the material without getting bogged down in mathematical details." ( Journal American Statistical Association )

"The authors have restructured classical material to a great extent and the new organization of the different topics is one of the outstanding services of the book." ( Metrika )

From the reviews of the second edition:

"Overall, Ewens and Grant have constructed a needed book in bioinformatics. It should help statisticians understand the emerging field of bioinformatics and serve as an introduction to bioinformatics for a statistician." Journal of the American Statistical Association, March 2006

"This book is the second edition of a book that was based on the content of a two-semester course in bioinformatics and computational biology … . is one of the most important books in this area from the perspective of teaching final year undergraduates and post-graduates in a range of disciplines. … this is a very good book, the best currently available for undergraduates and post-graduates at the intersection of computational biology, bioinformatics, statistics and applied mathematics and a worthwhile improvement on the first edition." (Mark Broom, Journal of the Royal Statistical Society, Vol. 169 (1), 2006)

"This is the second edition of Ewens and Grant’s very well written book on statistical methods in bioinformatics. … The authors have presented an excellent text for a graduate course … . It is clearly and interestingly written and is well organized and has comprehensive references to the literature. The writing style is excellent … . It is … truly a reference book for statistical methods in bioinformatics … . So I strongly recommend the book to both molecular biologists and statisticians … ." (Hamid Pezeshk, ISCB Newsletter, Issue 42, 2006)

"Ewens and Grant aim to fill a gap in the literature on statistics and probability in bioinformatics. … provides a review of the use of familiar statistical techniques and approaches to a new area. … it provides a rigorous treatment of statistical issues associated with bioinformatics tools and a strong statement of the statistical principles and philosophy which needs to underpin these tools. It admirably meets its objectives in this respect and is to be recommended." (David Lovell, Pharmaceutical Statistics, Issue 6, 2007)

"The most impressive achievement of this book is its development of blast theory. … The authors pace the knowledge flow smoothly. … The examples and exercises are well thought and highly motivated … . The authors do a fine job of emphasising the false discovery rate … . This book is structured perfectly for a textbook for everyone, statisticians, biologists and computer scientists. … I think this book does an excellent job in introducing many exciting statistical theories." (Lang Li, Briefings in Bioinformatics, Vol. 6 (4), 2005)

"In this book, Ewens and Grant seek to provide a link between bioinformatics and applied statistics. … The book provides detailed discussions of a number of useful distributions and highlights their role in bioinformatics. I found it quite useful and easy to follow. It is a good reference for multidisciplinary research teams in bioinformatics and students on some specialised taught courses." (Kassim S. Mwitondi, Journal of Applied Statistics, Vol. 33 (8), September, 2006)

A Practitioner's Guide to Data Management and Data Integration in Bioinformatics

Barbara A. Eckman , in Bioinformatics , 2003

Semantic Integration Featured Example: TAMBIS

The TAMBIS system is the result of a research collaboration between the departments of computer science and biological sciences at the University of Manchester in England. Its chief components are an ontology of biological and bioinformatics terms managed by a terminology server and a wrapper service that, as in DiscoveryLink, handles access to external data sources. An ontology is a rigorous formal specification of the conceptualization of a domain. The TAMBIS ontology (TaO) [ 42 ] describes the biologist's knowledge in a manner independent of individual data sources, links concepts to their real equivalents in the data sources, mediates between (near) equivalent concepts in the sources, and guides the user to form appropriate biological queries. The TaO contains approximately 1800 asserted biological concepts and their relationships and is capable of inferring many more. Coverage currently includes proteins and nucleic acids, protein structure and structural classification, biological processes and functions, and taxonomic classification.

The categorization of TAMBIS based on the six dimensions is given in Table 3.4 .

TABLE 3.4 . TAMBIS categorization with respect to the six dimensions of integration.

Interactive browserLimited querying capability via parameterized query builder
According to TAMBIS' authors, itsIntegrates via its wrapper service
“big win” lies in the ontology
Not usedUses BioKleisli for federated integration
Declarative AccessProcedural Access
Uses the CPL query language, but users seeNo procedural access
only the parameterized query builder
Information not availableInformation not available
Relational Data ModelNon-Relational Data Model
Relational data model not usedObject/complex-relational data model

Intelligent Bioinformatics : The Application of Artificial Intelligence Techniques to Bioinformatics Problems

Intelligent Bioinformatics requires only rudimentary knowledge of biology, bioinformatics or computer science and is aimed at interested readers regardless of discipline. Three introductory chapters on biology, bioinformatics and the complexities of search and optimisation equip the reader with the necessary knowledge to proceed through the remaining eight chapters, each of which is dedicated to an intelligent technique in bioinformatics.

The book also contains many links to software and information available on the internet, in academic journals and beyond, making it an indispensable reference for the 'intelligent bioinformatician'.

Intelligent Bioinformatics will appeal to all postgraduate students and researchers in bioinformatics and genomics as well as to computer scientists interested in these disciplines, and all natural scientists with large data sets to analyse.


Author Bios

Edward Keedwell is an Associate Professor in Computer Science. He joined the Computer Science discipline in 2006 having previously been a Research Fellow in the Centre for Water Systems and was appointed as a lecturer in Computer Science in 2009.

Ajit Narayanan is the inventor of FreeSpeech, a picture language with a deep grammatical structure. He's also the inventor of Avaz, India's first Augmentative and Alternative Communication device for children with disabilities.

Think Raku: How to Think Like a Computer Scientist - 2nd edition

Contributors: Rosenfeld and Downey

Publisher: Green Tea Press

Think Raku is an introduction to computer science and programming intended for people with little or no experience.

Bioinformatics : A Practical Guide to the Analysis of Genes and Proteins

"In this book, Andy Baxevanis and Francis Ouellette . . . have undertaken the difficult task of organizing the knowledge in this field in a logical progression and presenting it in a digestible form. And they have done an excellent job. This fine text will make a major impact on biological research and, in turn, on progress in biomedicine. We are all in their debt."
Eric Lander, from the Foreword to the Second Edition

"The editors and the chapter authors of this book are to be applauded for providing biologists with lucid and comprehensive descriptions of essential topics in bioinformatics. This book is easy to read, highly informative, and certainly timely. It is most highly recommended for students and for established investigators alike, for anyone who needs to know how to access and use the information derived in and from genomic sequencing projects."
Trends in Genetics

"It is an excellent general bioinformatics text and reference, perhaps even the best currently available . . . Congratulations to the authors, editors, and publisher for producing a weighty, authoritative, readable, and attractive book."
Briefings in Bioinformatics

"This book, written by the top scientists in the field of bioinformatics, is the perfect choice for every molecular biology laboratory."
The Quarterly Review of Biology

This fully revised version of a world-renowned bestseller provides readers with a practical guide covering the full scope of key concepts in bioinformatics, from databases to predictive and comparative algorithms. Using relevant biological examples, the book provides background on and strategies for using many of the most powerful and commonly used computational approaches for biological discovery. This Third Edition reinforces key concepts that have stood the test of time while making the reader aware of new and important developments in this fast-moving field. With a new full-color and enlarged page design, Bioinformatics, Third Edition offers the most readable, up-to-date, and thorough introduction to the field for biologists.

This new edition features:

  • New chapters on genomic databases, predictive methods using RNA sequences, sequence polymorphisms, protein structure prediction, intermolecular interactions, and proteomic approaches for protein identification
  • Detailed worked examples illustrating the strategic use of the concepts presented in each chapter, along with a collection of expanded, more rigorous problem sets suitable for classroom use
  • Special topic boxes and appendices highlighting experimental strategies and advanced concepts
  • Annotated reference lists, comprehensive lists of relevant Web resources, and an extensive glossary of commonly used terms in bioinformatics, genomics, and proteomics

Bioinformatics, Third Edition is essential reading for researchers, instructors, and students of all levels in molecular biology and bioinformatics, as well as for investigators involved in genomics, clinical research, proteomics, and computational biology.

Watch the video: Βιοϊατρικές Επιστήμες - Επαγγέλματα με μέλλον (May 2022).


  1. Devyn

    It is removed

  2. Ivey

    There is something in this. Thanks for the information, now I will know.

  3. Kibou

    I consider, that you are mistaken. Write to me in PM, we will discuss.

Write a message