Cases

Case: Specification and design of a systems biology software platform

  • A platform with task-specific GUI applications for different purposes
    • Graphical bio-data analysis workflow design and enactment
    • Biological pathway network editor scaling to pathways containing hundreds of thousands of nodes
    • Graphical life science experiment design and documentation
    • Analysis of multi-dimensional numerical data
  • Phases
    • Requirement analysis
    • Requirement specification
    • Technical analysis
    • Technical design
    • End-user documentation

Case: Design and implementation of a bioinformatics analysis library

  • High performance Java library 
    • Data formats specification
    • Parser design and implementation
    • Java object model design and implementation
  • A collection of bioinformatics algorithms implemented on top of the library
    • Statistical analysis: descriptive statistics, hypothesis testing, dimensionality reduction, clustering, enrichment (overrepresentation) analysis, etc. 
    • Modeling and simulation tools
  • Support for import and export of pathway and network model data in different formats
    • SBML, BioPAX

Case: Implementation of Native XML database

  • Native XML database on top of Microsoft SQL Server
    • storage unit is XML document instead of row
  • Different document types indexed differently
  • Supported relational constraints 
    • document - document foreign keys
    • document - table foreign keys
  • Extremely fast document retrieval
  • SQL was used for queryin

Case: End-user documentation

Authoring and editing the complete end-user documentation for a suite of bioinformatics desktop applications
  • Including user guides, reference manuals and tutorials
  • Creating a corporate technical writing style guide together with a language editing professional
 
Design and implementation of a single-source documentation system
  • Source available in a single XML source format (DocBook)
  • Support for internationalization (translation)
  • Automated processing system to produce output in various formats using XSLT
    • HTML
    • PDF output (through XSL-FO)

Case: Boolean modeling of T cell receptor (TCR) signaling

  • Building a Boolean network (on/off) model representing T cell receptor signaling
  • Simulating the signal propagation of from T cell receptor activation by propagating the values in the Boolean network by synchronous updating

Case: ODE modeling of PKC activation

  • Building a biochemical reaction network model of protein kinase C (PKC) activation in a neuron
  • Generating the corresponding system of ordinary differential equations (ODEs)
  • Simulating the response to incoming calcium pulses by solving the ODE model by numerical integration given initial and boundary conditions

Case: Large-scale integration of public biological databases

  • A Project integrating biological data from among other EBI and NCBI into an Oracle database
  • All major public databases, like GO, UniProt, ENSEMBL, NCBI RefSeq, KEGG, Transfac, IntAct, Reactome,...
  • Solving identity and namespace issues with objective and subjective criteria
  • In total about 30 databases integrated
  • Insert into empty database, update after new version is published
  • Convert the data into same relational format
  • Patent application number: EP06121967

Case: Combining biological data

  • A Project integrating biological data into an Oracle database
  • Make the data comparable, combine duplicate rows together, several different
    • Objectively the same: molecules have the same measurable attribute: DNA sequence, same coding sequence (CDS), amino acid sequence, etc
    • Subjectively the same: Someone claims these two molecules are the same and similarity is for example > 90%
  • Combine interaction data and component data
    • Reference model networks created for human and yeast (>100'000 nodes) (c.f. Aho T. et al.)
    • Genes, Transcripts, Proteins (enzymes, transcription factors, signaling factors, etc.), Protein complexes, Metabolites, etc.
  • Patent application number: EP06121967

Case: Bioinformatics data warehouse

  • Design of a relational model for bio-data component, system and state information (over 1500 tables)
  • implemented XML model
    • automated database creation
    • automated upgrading the database to latest version
    • Reduced manually written, database related Java code (automatically generated 200 000 lines of code)
    • Reduced manually written PL/SQL code (automatically generated 400 000 lines of code)
  • ETL: High performance "upsert" of new data from public and other biological databases
  • Versioning: multiple versions of rows. Enables 
    • static view of historical data
    • update-test-fix cycles of new data while users see the previous version of the data.
  • Patent application number: EP06121967

Case: Bioinformatics Database administration

  • Oracle 9i, 10g, 11g
  • Implemented centralized management using Oracle Grid Control 10g
    • over 100 managed targets (hosts, agents, listeners, databases,...)
    • Automated SLA monitoring and reporting, beacons pinging production system
    • Monitored hosts in Finland and China
  • 5 - 10 databases
    • Production, development and test databases
  • largest database was 5 TB and over 2 000 000 000 rows
  • Automated RMAN backups, centralized scripts

Case: Network motif finder

  • Tool for finding motifs from large scale biological networks:
    • 220 000 interactions
    • 160 000 interaction partners (genes, transcripts, proteins, compounds, etc)
    • almost 500 000 edges
  • Supports any number of nodes in the motif, tested with 20 node motifs
  • Euformatics Oy, Keilaranta 4, 02150 Espoo, Finland
© Euformatics Oy 2012