The 1st Epigenome Informatics Workshop

September 10th-12th, 2009, Houston, Texas


There will be five sessions spread of 2.5 days. Sessions 1-3 will be coducted by EDACC staff. Session 4 will be conducted by REMC and disease project representatives. Session 5 will focus on potential collaborative projects between EDACC and other NIH Epigenomics Roadmap Initiative members.

The workshop will include 3 exercises (as part of Sessions 1-3). The following data generated by the consortium may be used for the exercises:

  H1 cell line IMR90 cell line
ChIP-seq 6 marks ?
Methylome Bisulfite-seq
smRNA-seq yes yes
mRNA-seq yes yes







Thursday, Sep 10th

Session 1: Data Flow and Genboree
This session will have dual focus: (a) Epigenomic Data Flow and (b) Introduction to Genboree
Opening and Introduction – Art Beaudet and Aleks Milosavljevic
Data flow overview – Aleks Milosavljevic
Metadata, raw and analysis data, data flow
10:00am-10:30am Genboree feature overview – Alan Harris
Databases, tracks, projects, groups, users, access control
10:30am-10:45am Break
10:45am-12noon Genboree Exercises – Alan Harris
12noon-1:00pm Lunch

Session 2: Primary Data and Pipelines ( Galaxy and Genboree integration, Tools)
This session will have dual focus: (a) Primary data Analysis and (b) Galaxy/Genboree integration
1:00pm-1:30pm Galaxy-Genboree integration – Alan Harris
Integration of Galaxy pipelines and Genboree via Genboree APIs
1:30pm-2:15pm Primary Data Analysis Pipelines – Cristi Coarfa
Read mapping via Pash. Defining and computing Level 2, 3 data for various assays and pipelines. Defining verification procedures.
2:15pm-3:45pm Primary Data Analysis Exercises – Cristi Coarfa
Running Genboree-integrated EDACC-hosted Galaxy reference pipelines to perform primary analysis for various assays.
3:45pm-4:00pm Break
4:00pm-5:00pm Data Analysis Tools – Wei Li
MACS, BS-seq, others.
5:00pm-6:00pm EDACC Housewarming party (Suite 400D)
6:30pm-8:00pm Dinner (Trevisio restaurant, Texas Medical Center)
 Friday, Sep 11th
Session 3: Integrative Data Analysis and Visualization and Genboree APIs
This session will have dual focus: (a) Integrative Data Analysis and (b) EDACC-hosted methods and tools relevant of Intregrative Analysis: Virtual Genome Painting and Genboree APIs
9:00am-10:00am Integrative Data Analysis and Visualization – Cristi Coarfa
Finding differenecs: ChIP-seq, MACs and multiscale. Visualization in the context of UCSC tracks ( by using UCSC service or Genboree warehousing )
10:00am-10:15am Integrative Data Analysis and Visualization Exercises – Cristi
Virtual Genome Painting, Circos.
10:15am-11:00am Break
11:00am-12noon Using Genboree APIs to retrieve integrated data sets– Andrew Jackson
Retrieving data for integrative analysis of multiple marks, clustering and heatmap visualization.
12noon-1:00pm Lunch
Session 4: Participant-supplied topics: problems, solutions, and opportunities addressable in collaboration between REMCs or disease-focused projects and EDACC

 A sampling of problems/solutions/opportunities that may be addressed:

1. Sharing pre-publication data among members of collaboratring groups. Examples: sharing pre-publication data generated by REMCs among REMCs; sharing pre-publication data generated by disease projects among disease projects.
2. Pash mapping service for all assays ( EDACC automatically generating upon REMC data submission Level 1 data in SAM format for all assays )
3. Developement of reference pipelines: defining and computing Level 2, 3 data for all assays
4. Defining verification procedures and coordinating REMCs verfication and EDACC verification.
5. Defining epigenome similarity measures (multiscale) to identify global/local similarities and differences.
6. Warehousing external epigenomic and other data sets of relevance for integrative analysis
7. Developing, integrating, and serving up visualization tools: VGP, Circos, heatmaps, others
8. Integrative analysis: Serving up integrated data sets via Genboree APIs

1:00pm-1:15pm Introduction
1:15pm-1:45pm UW: Bob Thurman
Multiscale analysis, chipseq and chromatin accessibility.
1:45pm-2:15pm UCSF/UBC/UCSC: Nina?
Histone modification marks algorithms and analysis
2:15pm-2:45pm UCSD: Lee Edsall
Comparing lanes, peak calling, bisulfite sequencing mapping.
2:45pm-3:15pm Broad: Noam Shoresh
Discussion of topics of interest rather than a presentation. Peak calling, segmentation, analysis issues.
3:15pm-3:30pm Break
3:30pm-4:30pm Disease projects
4:30pm-5:00pm Summary
 Saturday, Sep 12th
Session 5: Outline of potential collaborative projects
9:00am-9:45am Outline of Collaborative Project 1
9:45am-10:30am Outline of Collaborative Project 2
10:30am-10:45am Break
10:45am-11:30am Outline of Collaborative Project 3
11:30am-12:00noon Closing Remarks and Adjourn – Aleks Milosavljevic

