April 2010

  • SNP statistics using Oracle XE

    I was reading a blog entry by Jan Aerts where he uses MongoDB to calculate statistics of SNPs from the 1000genomes project in different populations: CEU (European descent), YRI (African) and JPTCHB (Asian). The question was, how many SNPs are in common between those populations. Jan Aerts used Ruby and MongoDB to answer that question. He reported execution time of calculating the statistics to be about 55 minutes in his laptop. That is unreasonable for this smallish data. After all, there are only about 30 million SNP entries, or 800 MB of data. I will demonstrate here how the same thing can be done in about 3 minutes including data loading.

    So, I decided to try the same exercise using Oracle XE, the free (and limited) edition of Oracle database. The most important limitation here is memory as Oracle XE can use only total of 1 GB of memory. I gave most of it to PGA, which is program global area and used for sorting and hash operations. Since I am the only user of this database, I changed the memory management of work areas to manual mode and allocated more memory for hash and sort operations:

  • Euformatics Oy, Keilaranta 4, 02150 Espoo, Finland
© Euformatics Oy 2012