以前是没有想过用这个软件的,直到有一个我的htseq无法对比对的bam文件进行基因计数(后来我才发现htseq无法计数的原因是gtf版本不同导致坐标不同,而且gtf对染色体编号没有加上chr),我简单搜索了一下,发现bedtoolsmulticov也有类似的功能,所以我选择它来试试看!
首先注意它需要sort的bam文件及bam的index
bedtoolsmulticovdependsuponindexBAMfilesinordertocountthenumberofoverlapsineachBAMfile.Assuch,eachBAMfileshouldbepositionsorted(samtoolsortaln.bamaln.sort)andindexed(samtoolsindexaln.sort.bam)witheithersamtoolsorbamtools.
首先安装它:
解压开后
Makeclean
Makeall
然后就可以看到它的bin目录下全部是程序啦
命令很简单的
bedtoolsmulticov[OPTIONS]-bamsBAM1BAM2BAM3...BAMn-bed
Bydefault,multicovwillreportthecountofalignmentsineachinputBAMfilethatoverlap.
例子:
bedtoolsmulticov-bamsaln1.bamaln2.bamaln3.bam-bedivls-of-interest.bed
ivls-of-interest.bed这个文件是必须的,可能需要自己制作,其实用gtf文件也可以的
chr1010000ivl1
chr11000020000ivl2
chr12000030000ivl3
chr13000040000ivl4
输出结果前三列是坐标,第四列是基因名,跟我们的bed文件一样,只是最后三列是三个样本的计数,是添加上来的!
chr1010000ivl110022340
chr11000020000ivl212332451000
chr12000030000ivl321323322034
chr13000040000ivl433576540
同样是对gene的reads计数,bedtools的multicov工具与htseq-count的区别是
i'dguessit'sduetothefactthathtseq-countonlyreportsonehitperalignedreadassumingthatreadisaligneduniquelyanddoesnotoverlapmultiplefeatureslistedinyourGTF.ifanalignedreadhitsmorethanonefeatureinyourGTFthenitdoesn'treportthathit.bedtoolsgivesyourawhitswhichincludesevery1hitforeveryintersectionofeveryalignmentwithanyfeaturesintheGTFnomatterhowmanytimesitalignedorhowmanyfeaturesithit.youmightthink,"wow,htseq-countisdroppingalotofinformation".yes,itis!i'vemovedtousingothertoolstocounthitstogenes(RSEM/eXpress)sincetheydisambiguatethoseambiguousalignmentsandasaresultyougetcountsforallofyouralignedreads.inagenomewithalternativesplicingyoulosetoomuchdatausinghtseq-count,inmyopinion.
而且专门有个文献在讨论这个问题
DifferentialanalysisofgeneregulationattranscriptresolutionwithRNA-seq
下面我讲一个实际的例子
我的bam文件如下
bedtoolsmulticov-bams740WT1.bam741WT2.bam742KO1.bam743KO2.bam-bedmm9.bed
得到的这个矩阵就可以去用DESeq包来进行差异分析啦!