Section outline

    • Introduction to applied bioinformatics

      voluntary subject

      Assoc. Prof. Ing. Petra Matoušková, Ph.D.

      contact: matousp7@faf.cuni.cz

    • course aims

      The aim of this course is to learn basics about the bionformatics data, mainly gene and protein sequences, the formats, retreiving of the databases, comparison, search for similarity etc.

      The lectures are in the computer study room (S2250)

    • Organisation

      credit: 

      presence (8/10) + homeworks: ¨HW" from each lecture.

      exam: 

      "written" by computer, 2 set of 5 tasks (1hour/ submitted via Moodle)

    • Literature

      Applied bioinformatics - An introduction, P. Selzer, 2018

      Bioinformatics for dummies, J.M. Claverie, C. Notredame, 2007

      Knowledge Discovery in Bioinformatics, X. Hu, Y. Pan, 2007

    • Topics

      1. Literature search

      2. Protein bioinformatics I - sequences, features, digestions

      3. Protein bioinformatics II - domains, transmembrane helices, BLAST

      4. Protein bioinformatics III + sum up I - sequence comparison, multiple alignmet, 3D

      5. Nucleotide bioinformatics I - sequences, features

      6. Nucleotide bioinformatics II - translation, identification, sequencing

      7. Nucleotide bioinformatics III - RE digestion, primers

      8. Nucleotide bioinformatics IV -  cloning, specific primers

      9. Nucleotide bioinformatics V - qPCR primers, DNA/RNA secondary structure, mutagenesis primers

      10. Summary, examples

      11. Exam 2021


    • HUGO 

      Human Gene nomenclature hugo


    • PubmedV%C3%BDst%C5%99i%C5%BEek.PNGV%C3%BDst%C5%99i%C5%BEek.PNG

      Search through Medline database. Full texts through FAF login.

    • Web of ScienceV%C3%BDst%C5%99i%C5%BEek.PNG

      Database Web of Science (Clavirate Analytics) includes bibliographic materials from leading scientific journals from all fields. Enables managing of references through "WebEndNote".

    • Scopus

      Abstract and citation database.  Brings information about the H-index of a scientist.


    • Protein databases /  sequences retrieval

      Expasy / UniProtV%C3%BDst%C5%99i%C5%BEek%20%282%29.PNG

      Expasy is Swiss bioinformatics resource portal providing access to databases and software tools from a range of life science including genomics, proteomics, system biology etc.

      Uniprot V%C3%BDst%C5%99i%C5%BEek2.PNGis high-quality and freely accessible resource of protein sequence and functional information.
      detail tutorial: 

      NCBI protein

      V%C3%BDst%C5%99i%C5%BEek%20%283%29.PNG"National Center for Biotechnology Information Protein" The Protein database is a collection of sequences from several sources, including translations from annotated coding regions.

    • Protein sequence analyses

    • SMS: The Sequence Manipulation Suite - lots of small programs in JavaScript for various sequence manipulations (Molecular weight, Isoelectric point, statistics, range extractor etc.)


    • Simulation of protease cleavage

      PeptideCutter:  predicts potential cleavage sites cleaved by proteases or chemicals in a given protein sequence. 

    • Searching for protein motives and domains

      Searching databases for typical protein motifs/conserved domains enables the annotation of functional units in proteins, providing insights into sequence/structure/function relationships.

      Conserved domain database: NCBI/CD

      Other databases for domain search: SMARTInterPro

    • Signal peptides

      The prediction of protein localization by recognizing signal peptide on the protein N-terminus. SignalP 

    • Prediction of transmembrane helices

      Prediction is based on amino acid hydrophobicity and probability.

      Hydrofobicity profile: Expasy/ProtScale

      Transmembrane helices prediction: TMHMM, Phobius, TopCons (multiple programs consensus), CCTOP

      Figures: PROTTER


    • BLAST- searching for similarity

      (Basic Local Alignment Search Tool)

      Based on short parts of the query sequence program searches for similar sequences using „substitution matrix“, which defines the score of potential alignments. 

      NCBI/BLAST  tutorial: 

    • Pairwise and Multiple comparisons of protein sequences - (multiple)alignment

      Comparison is based on substitution matrix.

      Pair global comparison: Needle (compare sequences in full length)

      Pair local comparison: LALING (finds the most similar parts of two sequences)

      Multiple alignments:

      Multalin -a simple tool for comparison of two or more sequences

      Clustal Omega - enables to display of phylogeny tree

      Phylogenetic tree

      Advanced phylogeny here.

    • 3-D Structure

      PDB (Protein Data Bank)

    • Specific databases:

      Enzymes (Brenda), interactions (STRING)

    • Examples of typical tasks in exam test:

    • Ex1: Find two human DHRS7 sequences: DHRS7B (AAH09679.1) and DHRS7C (AAI47025.1) Run pair-wise alignment. How identical are these two proteins? Hint. Solution.

    • Ex2:  Find in Uniprot sequences of human NQO1 isoforms and align them. How many isofroms are there? Compare the alignment output to the description of each isoform, is it correct? Hint. Solution.

    • Ex3:Download the sequnce of "unknown protein" (here). Using domain prediction guess what is the function of the protein. Verify that by BLAST. What organism does it come from? Does it have any transmembrane helices?


  • Sequence comparison and translation

    • Comparisons of nucleotide sequences - (multiple)alignment

      Comparison is analogous to proteins. It is recommended to change substitution matrix.

      Multiple (or pairwise) alignments: Multalin

    • TranslationObr%C3%A1zek1.jpg

      =translation of nucleotide sequence into amino acids (protein)based on the genetic code

      SMS suite/ Translate → suitable only for full CDS (or when know ORF)

      NCBI/ORFfinder → suitable for translation of any nucleotide sequence, looking for ORFs

    • Unknown sequence identification

      BLASTn (searching nucleotide databases for similar nucleotide sequences)

    • DNA sequencing

      "Classic" Sanger´s sequencing (.scf, .abi, .ab1)

    • This (.ab1) is "unsupported" format, needs to be saved and then opened in chromas.

    • Detection of "vector contamination" in unknown sequence

      VecScreen

      Removing of "vector contamination" in unknown sequence

      SMS/Range Extractor DNA

    • Obr%C3%A1zek1.png

      Primer design

      PCR primer design

      .

      OligoCalc-calculator of primer properties 

    • Obtaining nucleotide sequence: NCBI

      Reverse complement: SMS

      Check for primers positions: Multiple (or pairwise) alignment: Multalin