Genomic Feature Analysis

Pipeline overview:

  1. Make master genome feature bed file
    • download feature files (.gff and beds) from OSF
    • concatenate files
  2. Use bedtools intersect to match DMRs to features in master genome feature file
  3. Generate background regions
    • find all CpGs covered by at least 3/4 samples per treatment group
    • then find CpGs common to all groups within each comparison (all ambient samples, all day 10 samples, etc.)
    • bin features in master genome feature bed file (2 kb)
      • the purpose of this is to make background regions more similar to DMRs; DMRs are between 6 and ~2kb. Some features are much larger (e.g. intergenic region can be > 10kb).
    • determine binned features that overlap with covered CpGs
    • filter features for those with at least 3 covered CpGs
  4. determine if the number of DMRs within certain feature categories is significantly different than the number of covered regions within certain feature categories
    • read data into R
    • create contingency tables for each comparison


Script output




Chi sq table generated by script

Next steps

  • reduce the number of features
    • remove rRNA since no data overlap with these
    • consolidate exon, CDS, gene, mRNA, etc.
  • add new features
    • UTR
    • putative promoter
  • re-run analyses with updated features