ChIP-seq analysis
pipelines/scripts for ChIP-seq
I have been involved in couple of projects heavily based on ChIP-seq data. See publications:
- male breast cancer project - published in Nature Communication
- prostate cancer project - published in Nature Communication
I wrote a snakemake
pipeline and python scripts for robust/reproducible processing and visualization of the data.
Snakemake pipeline
The pipeline is hosted on the github repository.
Roughly, the pipeline takes the following steps to produce the outcome:
- Downloading raw data (either bam/fastq files) from the specified locations (local, remote, or GEO) in DataList.csv
- Alignment with bwa-mem (in case of fastq files)
- Marking duplicate reads with picard
- Removing low-quality reads (retain reads with mapping quality > 20)
- Peak calling with MACS1.4/MACS2/DFilter (support more than one peak callers)
- Taking intersection between the peaks
See also README.md on the repository.
Python scripts for visualization
pybedtools
(see also online documentation) is a python wrapper of the bedtools.