Have a lot of fun. (Linus Torvalds)


This site accompanies my book entitled Computational Biology - Unix/Linux, Data Processing and Programming published by Springer in 2004. Here you find additional information. If you wish to buy this book, I recommend to choose the second edition.

Das Buch

This book is a practical introduction to Unix/Linux and programming for biologists as well as for chemists and physicists who work in bioinformatics and biophysics. The goal is to learn about the power of the stream editor sed and the programming languages awk and perl in order to extract or format information from various sources. awk is a great language for both learning programming and treating large text-based data files (contrary to binary files). To 99% you will work with text-based files, be it data tables, genomes, species lists or environmental data. Apart from being simple to learn and having a clear syntax, awk provides you with the possibility to construct your own commands. Thus, the language can grow with you as you grow with the language. perl is much more powerful but also more unclear in its syntax (or flexible, to put it positively), but, since awk was one basis for developing perl it is only a small step to go once you have learned awk - but a giant leap for your possibilities. You should take this step. By the way, both awk and perl run on all common operating systems. You never touched a computer? Great! The book is written for total beginners with no computational knowledge. First you will learn what a computer is and how to work on it. Then, basic programming constructs are introduced and applied. After having worked through this book, you will be able to work in the Unix environment (BSD, Linux, Knoppix, MacOSX, CygWin) and to write programs in order to format and analyse large data sets. Most programming examples are taken from biology; however, you need not be a biologist. Except for two or three examples, no biological knowledge is necessary. I have tried to illustrate almost everything practically with so-called terminals and examples. You should run these examples. Each chapter closes with some exercises. Brief solutions can be found at the end of the book. I chose Linux because it is open source software: you need not invest money except for the book itself. Furthermore, Linux provides all the great tools Unix provides. With Linux (as with all other Unix derivatives) you are close to your data. Via the command line you have immediate access to your files and can use either publicly available or your own designed tools to process these. With the aid of pipes you can construct your own data-processing pipeline. It is great.

Opinions

Errors

p. 003 the Homepage is now obsolete
p. 049 the command given with find option -exec must not be enclosed in double quotes as shown in line 23
p. 057 In lines 2, 4, and 6 the version number 2.2.8 must be replaced by version 2.2.4 in the file names.
p. 068 line 5-6 from the bottom should read: This is shown in line 13.
p. 104 The command in the last line should read . ./chg-pwd.sh
p. 112 line 20: not the option -z but the option -e gives true if the file exists.
p. 124 the command in line 6 should read: ${#array[*]}
p. 194 line 4 should start: Preceding the width with the minus...
p. 224 the descriptions for $++ / ++$ and $-- / --$ have to be exchanged
p. 234 section 12.4.1 - elseif should read elsif
p. 240 terminal 159 - line 2 has to be closed with a semicolon: $i=1;
p. 260 figure 12.1 - replace $m[6][3] by $[5][2]
p. 260 figure 12.1 legend, line 6: replace cell [6,3] by cell [5,2]
p. 270 A.6.3: for most systems the solution to connect USB memory sticks is: mount -t vfat /dev/sda /home/Freddy/USBstick This command requires the exostence of the directory USBstick in the home directory
p. 274 Solution 4.16: Creation and editing does, renaming does not change time stamp.

Literature

Where possible I added links to the literature cited in the book.

Altschul SF, Gish W, Miller W, Myers EW Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-410

Deitel HM, Deitel PJ, Nieto TR, McPhie DC (2001) Perl How to Program. Prentice Hall, Upper Saddle River, NY, ISBN 0-13-028418-1

Dougherty D, Robbins A (1997) sed & awk. O'Reilly & Associates, Sebastopol, CA, ISBN 1-56592-225-5

Dwyer RA (2003) Genomic Perl. Cambridge University Press, Cambridge, New York, ISBN 0-521-80177-X

Herold H (2003) sed & awk. Addison-Wesley, Bonn, Paris, ISBN 3-8273-2094-1

Lamb L, Robbins A (1998) Learning the vi Editor. O'Reilly & Associates, Sebastopol, CA, ISBN 1-56592-426-6

Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Daklady, 10:707-710

Mangalam HJ (2002) tacg - a grep for DNA. BMC Bioinformatics 3:8

Qualline S (2001) Vi iMproved - VIM. New Riders Publishing, Indianapolis, IN, ISBN 0-7357-1001-5

Ritchie DM, Thompson K (1974) The UNIX Time-Sharing System. C ACM 17:365-337

Robbins A (1999) VI Editor Pocket Reference. O'Reilly & Associates, Sebastopol, CA, ISBN 1-56592-497-5

Schürmann P (2000) Plant Thioredoxin Systems Revisited. Annu Rev Plant Physiol Plant Mol Biol 51:371-400

Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E (2002) The Bioperl Toolkit: Perl Modules for the Life Sciences. Genome Res 12:1611-1618

Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research 22:4673-4680

Tisdall J (2001) Beginning Perl for Bioinformatics. O'Reilly & Associates, Sebastopol, CA, ISBN 0-596-00080-4

Torvalds L, Diamond D (2001) Just for Fun: The Story of an Accidental Revolutionary. HarperBusiness, NY, ISBN: 0-066-62073-2

Vromans J (2000) Perl 5. O'Reilly & Associates, Sebastopol, CA, ISBN 0-596-00032-4

Wünschiers R, Heide H, Follmann H, Senger H, Schulz R (1999) Redox control of hydrogenase activity in the green alga Scenedesmus obliquus by thioredoxin and other thiols. FEBS Lett 455:162-164