Tues. Aug. 6, 2019 Salmon + Sea lice BS data analysis

  1. Checksums from copying adapter + 5bp from 5’ trimmed data from mox to gannet:

    • ran this mox script: https://gannet.fish.washington.edu/metacarcinus/mox_jobs/20190805_md5cksm.sh

    • renamed output file and copied to Gannet

        mv /gscratch/srlab/strigg/data/Salmo_Calig/TRIM_1st5bp/checksums.md5 /gscratch/srlab/strigg/data/Salmo_Calig/TRIM_1st5bp/checksums.md5.mox
      		
        scp /gscratch/srlab/strigg/data/Salmo_Calig/TRIM_1st5bp/checksums.md5.mox strigg@ostrich.fish.washington.edu:/Volumes/web/metacarcinus/Salmo_Calig/analyses/20190722/TRIM_1st5bp
        Password:
        checksums.md5.mox                                                                                                  100%   12KB   3.2MB/s   00:00 
      
    • ran a comparison check of both [checksums.md5.mox] and [checksums.md5] to see if they are the same

        ostrich:TRIM_1st5bp strigg$ paste -d'\t' checksums.md5.mox checksums.md5 | awk '{if($1=$6)print $1,$6}' | head
        7036f0a884346dcbcdf5a66f24a9a926 7036f0a884346dcbcdf5a66f24a9a926
        e13367f5f3feb36753845d7e130de4aa e13367f5f3feb36753845d7e130de4aa
        dce1e7629182baeb37102cddb09db512 dce1e7629182baeb37102cddb09db512
        1170441e896de35738ed0300244d89fe 1170441e896de35738ed0300244d89fe
        ddb14c1f81ff1fe810b019dc13baf488 ddb14c1f81ff1fe810b019dc13baf488
        bb576b125bdebdd2750cc88996a02c59 bb576b125bdebdd2750cc88996a02c59
        5e6590529f5a4634c10b65e157d83e53 5e6590529f5a4634c10b65e157d83e53
        a3209d0e0d6ec8ecc04260b52de4e132 a3209d0e0d6ec8ecc04260b52de4e132
        cc25f84b0bced546d7573f9503ea4b98 cc25f84b0bced546d7573f9503ea4b98
        6f0113309650574e9b560de68f33e7f1 6f0113309650574e9b560de68f33e7f1
      		
        ostrich:TRIM_1st5bp strigg$ paste -d'\t' checksums.md5.mox checksums.md5 | awk '{if($1=$6)print $1,$6}' | wc -l
              88
      		      
        ostrich:TRIM_1st5bp strigg$ paste -d'\t' checksums.md5.mox checksums.md5 | awk '{if($1!=$6)print $1,$6}' | wc -l
               0
      
    • CONCLUSIONS: file codes are the same so copying from mox to Gannet worked fine

  2. Concatenating L001 and L002 .gz files
    • got job to run on mox by changing the sbatch time (maintenance is happening next week so my previous job’s run time would have overlapped with that)
    • output saved on mox here: /gscratch/scrubbed/strigg/TRIM_cat/
    • transferred concatenated files to Gannet here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190722/TRIM_cat/

        ostrich:20190722 strigg$ rsync --archive --progress --verbose strigg@mox.hyak.uw.edu:/gscratch/scrubbed/strigg/TRIM_cat .
        Password: 
        Enter passcode or select one of the following options:
      		
         1. Duo Push to Android (XXX-XXX-0029)
         2. Phone call to Android (XXX-XXX-0029)
      		
        Duo passcode or option [1-2]: 1
        receiving file list ... 
        46 files to consider
        TRIM_cat/
        TRIM_cat/16C_26psu_1_S13_R1_001_trimmed.5bp_3prime.fq.gz
          2272571496 100%   37.97MB/s    0:00:57 (xfer#1, to-check=44/46)
        TRIM_cat/16C_26psu_1_S13_R2_001_trimmed.5bp_3prime.fq.gz
          2393036541 100%   37.47MB/s    0:01:00 (xfer#2, to-check=43/46)
        TRIM_cat/16C_26psu_2_S14_R1_001_trimmed.5bp_3prime.fq.gz
          1794524920 100%   38.64MB/s    0:00:44 (xfer#3, to-check=42/46)
        TRIM_cat/16C_26psu_2_S14_R2_001_trimmed.5bp_3prime.fq.gz
          1871002080 100%   35.30MB/s    0:00:50 (xfer#4, to-check=41/46)
        TRIM_cat/16C_26psu_3_S15_R1_001_trimmed.5bp_3prime.fq.gz
          1515096746 100%   35.59MB/s    0:00:40 (xfer#5, to-check=40/46)
        TRIM_cat/16C_26psu_3_S15_R2_001_trimmed.5bp_3prime.fq.gz
          1604258990 100%   38.59MB/s    0:00:39 (xfer#6, to-check=39/46)
        TRIM_cat/16C_26psu_4_S16_R1_001_trimmed.5bp_3prime.fq.gz
          1739852971 100%   36.87MB/s    0:00:45 (xfer#7, to-check=38/46)
        TRIM_cat/16C_26psu_4_S16_R2_001_trimmed.5bp_3prime.fq.gz
          1817746738 100%   33.96MB/s    0:00:51 (xfer#8, to-check=37/46)
        TRIM_cat/16C_32psu_1_S1_R1_001_trimmed.5bp_3prime.fq.gz
          1896054131 100%   34.35MB/s    0:00:52 (xfer#9, to-check=36/46)
        TRIM_cat/16C_32psu_1_S1_R2_001_trimmed.5bp_3prime.fq.gz
          1981391166 100%   38.21MB/s    0:00:49 (xfer#10, to-check=35/46)
        TRIM_cat/16C_32psu_2_S2_R1_001_trimmed.5bp_3prime.fq.gz
          1715567344 100%   38.37MB/s    0:00:42 (xfer#11, to-check=34/46)
        TRIM_cat/16C_32psu_2_S2_R2_001_trimmed.5bp_3prime.fq.gz
          1789513677 100%   36.53MB/s    0:00:46 (xfer#12, to-check=33/46)
        TRIM_cat/16C_32psu_3_S3_R1_001_trimmed.5bp_3prime.fq.gz
          1291766259 100%   35.51MB/s    0:00:34 (xfer#13, to-check=32/46)
        TRIM_cat/16C_32psu_3_S3_R2_001_trimmed.5bp_3prime.fq.gz
          1379258469 100%   34.33MB/s    0:00:38 (xfer#14, to-check=31/46)
        TRIM_cat/16C_32psu_4_S4_R1_001_trimmed.5bp_3prime.fq.gz
          1704589933 100%   36.14MB/s    0:00:44 (xfer#15, to-check=30/46)
        TRIM_cat/16C_32psu_4_S4_R2_001_trimmed.5bp_3prime.fq.gz
          1802754382 100%   36.61MB/s    0:00:46 (xfer#16, to-check=29/46)
        TRIM_cat/8C_26psu_1_S9_R1_001_trimmed.5bp_3prime.fq.gz
          1872155177 100%   38.20MB/s    0:00:46 (xfer#17, to-check=28/46)
        TRIM_cat/8C_26psu_1_S9_R2_001_trimmed.5bp_3prime.fq.gz
          1957282059 100%   35.13MB/s    0:00:53 (xfer#18, to-check=27/46)
        TRIM_cat/8C_26psu_2_S10_R1_001_trimmed.5bp_3prime.fq.gz
          1837426749 100%   38.07MB/s    0:00:46 (xfer#19, to-check=26/46)
        TRIM_cat/8C_26psu_2_S10_R2_001_trimmed.5bp_3prime.fq.gz
          1935888232 100%   37.95MB/s    0:00:48 (xfer#20, to-check=25/46)
        TRIM_cat/8C_26psu_3_S11_R1_001_trimmed.5bp_3prime.fq.gz
          1605119393 100%   34.60MB/s    0:00:44 (xfer#21, to-check=24/46)
        TRIM_cat/8C_26psu_3_S11_R2_001_trimmed.5bp_3prime.fq.gz
          1703938463 100%   34.62MB/s    0:00:46 (xfer#22, to-check=23/46)
        TRIM_cat/8C_26psu_4_S12_R1_001_trimmed.5bp_3prime.fq.gz
          1321058388 100%   38.23MB/s    0:00:32 (xfer#23, to-check=22/46)
        TRIM_cat/8C_26psu_4_S12_R2_001_trimmed.5bp_3prime.fq.gz
          1414160455 100%   37.52MB/s    0:00:35 (xfer#24, to-check=21/46)
        TRIM_cat/8C_32psu_1_S5_R1_001_trimmed.5bp_3prime.fq.gz
          1231884567 100%   37.73MB/s    0:00:31 (xfer#25, to-check=20/46)
        TRIM_cat/8C_32psu_1_S5_R2_001_trimmed.5bp_3prime.fq.gz
          1286891488 100%   38.51MB/s    0:00:31 (xfer#26, to-check=19/46)
        TRIM_cat/8C_32psu_2_S6_R1_001_trimmed.5bp_3prime.fq.gz
          1326114322 100%   37.53MB/s    0:00:33 (xfer#27, to-check=18/46)
        TRIM_cat/8C_32psu_2_S6_R2_001_trimmed.5bp_3prime.fq.gz
          1369513220 100%   35.69MB/s    0:00:36 (xfer#28, to-check=17/46)
        TRIM_cat/8C_32psu_3_S7_R1_001_trimmed.5bp_3prime.fq.gz
          1373099534 100%   35.87MB/s    0:00:36 (xfer#29, to-check=16/46)
        TRIM_cat/8C_32psu_3_S7_R2_001_trimmed.5bp_3prime.fq.gz
          1429454137 100%   35.57MB/s    0:00:38 (xfer#30, to-check=15/46)
        TRIM_cat/8C_32psu_4_S8_R1_001_trimmed.5bp_3prime.fq.gz
          1558622098 100%   34.77MB/s    0:00:42 (xfer#31, to-check=14/46)
        TRIM_cat/8C_32psu_4_S8_R2_001_trimmed.5bp_3prime.fq.gz
          1642882596 100%   30.95MB/s    0:00:50 (xfer#32, to-check=13/46)
        TRIM_cat/CTRL_16C_26psu_1_S19_R1_001_trimmed.5bp_3prime.fq.gz
          1914346417 100%   38.22MB/s    0:00:47 (xfer#33, to-check=12/46)
        TRIM_cat/CTRL_16C_26psu_1_S19_R2_001_trimmed.5bp_3prime.fq.gz
          2011802240 100%   34.10MB/s    0:00:56 (xfer#34, to-check=11/46)
        TRIM_cat/CTRL_16C_26psu_2_S21_R1_001_trimmed.5bp_3prime.fq.gz
          1275247795 100%   37.46MB/s    0:00:32 (xfer#35, to-check=10/46)
        TRIM_cat/CTRL_16C_26psu_2_S21_R2_001_trimmed.5bp_3prime.fq.gz
          1323060829 100%   31.40MB/s    0:00:40 (xfer#36, to-check=9/46)
        TRIM_cat/CTRL_8C_26psu_1_S17_R1_001_trimmed.5bp_3prime.fq.gz
          1422864827 100%   36.53MB/s    0:00:37 (xfer#37, to-check=8/46)
        TRIM_cat/CTRL_8C_26psu_1_S17_R2_001_trimmed.5bp_3prime.fq.gz
          1500712020 100%   32.79MB/s    0:00:43 (xfer#38, to-check=7/46)
        TRIM_cat/CTRL_8C_26psu_2_S18_R1_001_trimmed.5bp_3prime.fq.gz
          1481425749 100%   35.59MB/s    0:00:39 (xfer#39, to-check=6/46)
        TRIM_cat/CTRL_8C_26psu_2_S18_R2_001_trimmed.5bp_3prime.fq.gz
          1547852687 100%   38.00MB/s    0:00:38 (xfer#40, to-check=5/46)
        TRIM_cat/Sealice_F1_S20_R1_001_trimmed.5bp_3prime.fq.gz
         12379825515 100%   36.30MB/s    0:05:25 (xfer#41, to-check=4/46)
        TRIM_cat/Sealice_F1_S20_R2_001_trimmed.5bp_3prime.fq.gz
         12999258153 100%   34.86MB/s    0:05:55 (xfer#42, to-check=3/46)
        TRIM_cat/Sealice_F2_S22_R1_001_trimmed.5bp_3prime.fq.gz
          8873822837 100%   36.13MB/s    0:03:54 (xfer#43, to-check=2/46)
        TRIM_cat/Sealice_F2_S22_R2_001_trimmed.5bp_3prime.fq.gz
          9376290902 100%   35.48MB/s    0:04:11 (xfer#44, to-check=1/46)
        TRIM_cat/slurm-1141613.out
                  72 100%    0.10kB/s    0:00:00 (xfer#45, to-check=0/46)
      		
        sent 1012 bytes  received 109567734710 bytes  37879943.21 bytes/sec
        total size is 109540986764  speedup is 1.00
      
  3. Alignment parameters test
    • OUTCOME: summary file shows very poor alignment in all parameters tested for both sea lice samples
  • To see if poor alignment was specific to the sea lice samples or genome, I proceeded running the alignment test on the Salmon data (mox job 20190806_BmrkCmpAS_Ssalar_100K.sh)
    • OUTCOME: summary file showed Salmon data aligns very poorly also, so it’s not sample or genome specific
      • I ended up canceling the mox job before running all test parameters (only did AS-0.2, AS-0.6, and AS-1 corresponding to columns 2-4 in summary file)
  • To test if my 5bp blunt trimming was causing problems, I ran the alignment parameters test on sea lice data that had only been adapter trimmed and lanes not concatenated
  • I found this article discussing ‘dovetailing’ reads (where one or oth reads seem to extend past the start of the mate). It says that dovetailing reads are considered discordant and are not reported.

  • So I tested both the trimming and alignment parameters specified in article on one set of read pairs

      (base) ostrich:TRIM_5bp strigg$ /Users/strigg/anaconda3/bin/TrimGalore-0.6.0/trim_galore \
      > --output_dir /Users/strigg/Desktop/Salmo_Calig/TRIM \
      > --fastqc_args \
      > "--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM/FASTQC" \
      > --clip_R1 9 \
      > --clip_R2 9 \
      > --paired \
      > --path_to_cutadapt /Users/strigg/anaconda3/bin/cutadapt \
      > /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz \
      > /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz 
    
  • I copied these trimmed reads 16C_26psu_1_S13_L001_R1 and 16C_26psu_1_S13_L001_R2 to mox and ran 20190806_BmrkCmpAS_Ssalar_100K_test.sh
    • OUTCOME: summary file showed these parameters greatly improved mapping
  • To see if –clip_R1 and –clip_R2 could be adjusted to 5bp (as suggested by Zymo) without losing mapping efficiency, I ran the trimming again on the same two files (full output appended at end)

      (base) ostrich:TRIM_5bp strigg$ /Users/strigg/anaconda3/bin/TrimGalore-0.6.0/trim_galore \
      > --output_dir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp \
      > --fastqc_args \
      > "--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC" \
      > --clip_R1 5 \
      > --clip_R2 5 \
      > --paired \
      > --path_to_cutadapt /Users/strigg/anaconda3/bin/cutadapt \
      > /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz \
      > /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz 
    
  • I copied these trimmed reads (R1 and R2) to mox and ran 20190806_Bmrk_CompareAS_Ssalar_100K_test5bp.sh
    • OUTCOME: summary file showed nearly identical mapping efficiencies so clipping 5bp is sufficient.
  • I am currently running TrimGalore with the same parameters as directly above through this jupyter notebook 20190722_SalmonSeaLice_FASTQC_TRIM.ipynb on ostrich.

NEXT STEPS:

  1. Copy newly trimmed reads to mox
  2. Run alignment parameter test on subset of all trimmed reads
  3. Decide which parameters to go with
  4. Start full Bismark analysis on all data

Full output from TrimGALORE Trim adapters + 5bp

(base) ostrich:TRIM_5bp strigg$ /Users/strigg/anaconda3/bin/TrimGalore-0.6.0/trim_galore \
> --output_dir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp \
> --fastqc_args \
> "--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC" \
> --clip_R1 5 \
> --clip_R2 5 \
> --paired \
> --path_to_cutadapt /Users/strigg/anaconda3/bin/cutadapt \
> /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz \
> /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz 
Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: '/Users/strigg/anaconda3/bin/cutadapt' (user defined)
Cutadapt seems to be working fine (tested command '/Users/strigg/anaconda3/bin/cutadapt --version')
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Output directory /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/ doesn't exist, creating it for you...

Output will be written into the directory: /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz <<)

Found perfect matches for the following adapter sequences:
Adapter type	Count	Sequence	Sequences analysed	Percentage
Illumina	538094	AGATCGGAAGAGC	1000000	53.81
Nextera	0	CTGTCTCTTATA	1000000	0.00
smallRNA	0	TGGAATTCTCGG	1000000	0.00
Using Illumina adapter for trimming (count: 538094). Second best hit was Nextera (count: 0)

Writing report to '/Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/16C_26psu_1_S13_L001_R1_001.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.0
Cutadapt version: 2.4
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
All Read 1 sequences will be trimmed by 5 bp from their 5' end to avoid poor qualities or biases
All Read 2 sequences will be trimmed by 5 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications)
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: '--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC'
Output file(s) will be GZIP compressed

Writing final adapter and quality trimmed output to 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz


  >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz <<< 
10000000 sequences processed
20000000 sequences processed
This is cutadapt 2.4 with Python 3.6.8
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 852.63 s (38 us/read; 1.56 M reads/minute).

=== Summary ===

Total reads processed:              22,159,935
Reads with adapters:                17,366,785 (78.4%)
Reads written (passing filters):    22,159,935 (100.0%)

Total basepairs processed: 3,323,990,250 bp
Quality-trimmed:              20,636,466 bp (0.6%)
Total written (filtered):  2,599,584,885 bp (78.2%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17366785 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 29.9%
  C: 14.0%
  G: 20.5%
  T: 35.6%
  none/other: 0.0%

Overview of removed sequences
length	count	expect	max.err	error counts
1	2904827	5539983.8	0	2904827
2	423079	1384995.9	0	423079
3	231866	346249.0	0	231866
4	143449	86562.2	0	143449
5	117570	21640.6	0	117570
6	115529	5410.1	0	115529
7	96119	1352.5	0	96119
8	119364	338.1	0	119364
9	118406	84.5	0	116886 1520
10	112051	21.1	1	106032 6019
11	122987	5.3	1	113734 9253
12	119477	1.3	1	111842 7635
13	120023	0.3	1	112246 7777
14	122963	0.3	1	114085 8878
15	127528	0.3	1	119209 8319
16	126445	0.3	1	117002 9443
17	152761	0.3	1	140488 12273
18	129037	0.3	1	121213 7824
19	112889	0.3	1	105773 7116
20	133616	0.3	1	125004 8612
21	146516	0.3	1	136291 10225
22	126419	0.3	1	119045 7374
23	145391	0.3	1	135716 9675
24	136894	0.3	1	125670 11224
25	145117	0.3	1	135313 9804
26	134897	0.3	1	126717 8180
27	148757	0.3	1	137734 11023
28	145635	0.3	1	135492 10143
29	153188	0.3	1	140948 12240
30	163373	0.3	1	154030 9343
31	130689	0.3	1	122129 8560
32	157390	0.3	1	147156 10234
33	145226	0.3	1	135269 9957
34	158524	0.3	1	147826 10698
35	161582	0.3	1	148734 12848
36	173098	0.3	1	162838 10260
37	141921	0.3	1	133053 8868
38	164933	0.3	1	154468 10465
39	167185	0.3	1	155913 11272
40	175799	0.3	1	163207 12592
41	197239	0.3	1	176427 20812
42	165247	0.3	1	150517 14730
43	270149	0.3	1	254538 15611
44	85600	0.3	1	76961 8639
45	197628	0.3	1	175697 21931
46	723044	0.3	1	688287 34757
47	254889	0.3	1	240372 14517
48	114855	0.3	1	101708 13147
49	409546	0.3	1	390013 19533
50	125438	0.3	1	116192 9246
51	110069	0.3	1	99667 10402
52	381945	0.3	1	364475 17470
53	196688	0.3	1	188239 8449
54	63003	0.3	1	58174 4829
55	106109	0.3	1	101117 4992
56	55521	0.3	1	50154 5367
57	169184	0.3	1	161794 7390
58	35182	0.3	1	32212 2970
59	44244	0.3	1	39185 5059
60	173147	0.3	1	165425 7722
61	86440	0.3	1	82402 4038
62	40559	0.3	1	36479 4080
63	130914	0.3	1	123404 7510
64	122315	0.3	1	116337 5978
65	65805	0.3	1	60428 5377
66	148115	0.3	1	139100 9015
67	175930	0.3	1	163369 12561
68	293688	0.3	1	278052 15636
69	189726	0.3	1	179114 10612
70	75474	0.3	1	69457 6017
71	27345	0.3	1	23952 3393
72	97969	0.3	1	91968 6001
73	138145	0.3	1	130111 8034
74	143624	0.3	1	135279 8345
75	142674	0.3	1	134066 8608
76	141682	0.3	1	133308 8374
77	141205	0.3	1	132947 8258
78	133036	0.3	1	124966 8070
79	131630	0.3	1	123633 7997
80	128859	0.3	1	121116 7743
81	126119	0.3	1	118667 7452
82	123492	0.3	1	116262 7230
83	118847	0.3	1	111615 7232
84	115583	0.3	1	108642 6941
85	111409	0.3	1	104817 6592
86	107099	0.3	1	100545 6554
87	102731	0.3	1	96477 6254
88	101527	0.3	1	95245 6282
89	95383	0.3	1	89600 5783
90	95037	0.3	1	89230 5807
91	89836	0.3	1	84395 5441
92	84977	0.3	1	79865 5112
93	79479	0.3	1	74762 4717
94	76672	0.3	1	72010 4662
95	70676	0.3	1	66313 4363
96	66886	0.3	1	62766 4120
97	63105	0.3	1	59150 3955
98	58813	0.3	1	55101 3712
99	56801	0.3	1	52921 3880
100	51251	0.3	1	47973 3278
101	48131	0.3	1	44969 3162
102	43975	0.3	1	41134 2841
103	40749	0.3	1	38132 2617
104	37602	0.3	1	35059 2543
105	33601	0.3	1	31318 2283
106	30687	0.3	1	28622 2065
107	27550	0.3	1	25698 1852
108	25919	0.3	1	24135 1784
109	23271	0.3	1	21584 1687
110	20524	0.3	1	19077 1447
111	18563	0.3	1	17263 1300
112	16204	0.3	1	15036 1168
113	14110	0.3	1	13017 1093
114	12321	0.3	1	11394 927
115	10629	0.3	1	9884 745
116	9483	0.3	1	8756 727
117	8057	0.3	1	7440 617
118	7067	0.3	1	6491 576
119	5777	0.3	1	5264 513
120	4917	0.3	1	4511 406
121	4181	0.3	1	3800 381
122	3506	0.3	1	3174 332
123	2912	0.3	1	2640 272
124	2546	0.3	1	2297 249
125	2101	0.3	1	1902 199
126	1717	0.3	1	1548 169
127	1536	0.3	1	1361 175
128	1201	0.3	1	1095 106
129	1012	0.3	1	912 100
130	739	0.3	1	668 71
131	638	0.3	1	572 66
132	553	0.3	1	500 53
133	556	0.3	1	500 56
134	665	0.3	1	607 58
135	362	0.3	1	324 38
136	207	0.3	1	176 31
137	143	0.3	1	128 15
138	99	0.3	1	86 13
139	84	0.3	1	72 12
140	58	0.3	1	50 8
141	42	0.3	1	38 4
142	37	0.3	1	29 8
143	24	0.3	1	22 2
144	25	0.3	1	17 8
145	26	0.3	1	22 4
146	29	0.3	1	21 8
147	85	0.3	1	77 8
148	64	0.3	1	59 5
149	40	0.3	1	35 5
150	355	0.3	1	318 37

RUN STATISTICS FOR INPUT FILE: /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz
=============================================
22159935 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Writing report to '/Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/16C_26psu_1_S13_L001_R2_001.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.0
Cutadapt version: 2.4
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
All Read 1 sequences will be trimmed by 5 bp from their 5' end to avoid poor qualities or biases
All Read 2 sequences will be trimmed by 5 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications)
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: '--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC'
Output file(s) will be GZIP compressed

Writing final adapter and quality trimmed output to 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz


  >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz <<< 
10000000 sequences processed
20000000 sequences processed
This is cutadapt 2.4 with Python 3.6.8
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 904.27 s (41 us/read; 1.47 M reads/minute).

=== Summary ===

Total reads processed:              22,159,935
Reads with adapters:                17,263,070 (77.9%)
Reads written (passing filters):    22,159,935 (100.0%)

Total basepairs processed: 3,323,990,250 bp
Quality-trimmed:              16,760,179 bp (0.5%)
Total written (filtered):  2,603,569,143 bp (78.3%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17263070 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 28.6%
  C: 12.4%
  G: 23.7%
  T: 35.3%
  none/other: 0.0%

Overview of removed sequences
length	count	expect	max.err	error counts
1	2980698	5539983.8	0	2980698
2	434290	1384995.9	0	434290
3	244115	346249.0	0	244115
4	143736	86562.2	0	143736
5	130207	21640.6	0	130207
6	112454	5410.1	0	112454
7	91383	1352.5	0	91383
8	130413	338.1	0	130413
9	102897	84.5	0	102495 402
10	113241	21.1	1	107719 5522
11	119188	5.3	1	111572 7616
12	120663	1.3	1	112735 7928
13	133942	0.3	1	125965 7977
14	126547	0.3	1	117452 9095
15	102830	0.3	1	96138 6692
16	149163	0.3	1	139761 9402
17	119889	0.3	1	111631 8258
18	113552	0.3	1	106594 6958
19	144826	0.3	1	135202 9624
20	135016	0.3	1	126722 8294
21	118040	0.3	1	110327 7713
22	135122	0.3	1	126795 8327
23	139623	0.3	1	130868 8755
24	139327	0.3	1	130312 9015
25	156486	0.3	1	147346 9140
26	131367	0.3	1	122873 8494
27	144343	0.3	1	134126 10217
28	135732	0.3	1	128445 7287
29	144460	0.3	1	135341 9119
30	143460	0.3	1	135475 7985
31	154935	0.3	1	145507 9428
32	140211	0.3	1	132501 7710
33	154714	0.3	1	145023 9691
34	197197	0.3	1	185506 11691
35	114069	0.3	1	107701 6368
36	154713	0.3	1	145095 9618
37	166945	0.3	1	157027 9918
38	155394	0.3	1	147358 8036
39	151879	0.3	1	143914 7965
40	168764	0.3	1	158797 9967
41	162964	0.3	1	154146 8818
42	166211	0.3	1	157158 9053
43	152395	0.3	1	144005 8390
44	174656	0.3	1	164812 9844
45	186871	0.3	1	176811 10060
46	135036	0.3	1	127666 7370
47	168636	0.3	1	159853 8783
48	166288	0.3	1	157802 8486
49	174530	0.3	1	165351 9179
50	164769	0.3	1	155907 8862
51	207765	0.3	1	197011 10754
52	147151	0.3	1	139795 7356
53	177051	0.3	1	168817 8234
54	133625	0.3	1	126900 6725
55	174567	0.3	1	166338 8229
56	178950	0.3	1	169886 9064
57	168945	0.3	1	161037 7908
58	161931	0.3	1	154618 7313
59	165789	0.3	1	157874 7915
60	171736	0.3	1	163328 8408
61	175826	0.3	1	167046 8780
62	192123	0.3	1	182534 9589
63	252743	0.3	1	241513 11230
64	207112	0.3	1	199124 7988
65	135660	0.3	1	129354 6306
66	145866	0.3	1	139110 6756
67	145748	0.3	1	138731 7017
68	146161	0.3	1	139522 6639
69	144487	0.3	1	137768 6719
70	154292	0.3	1	147160 7132
71	158353	0.3	1	151327 7026
72	150227	0.3	1	143181 7046
73	157376	0.3	1	150704 6672
74	144388	0.3	1	138042 6346
75	137234	0.3	1	131094 6140
76	133825	0.3	1	127901 5924
77	133664	0.3	1	127640 6024
78	127632	0.3	1	121889 5743
79	122934	0.3	1	117397 5537
80	122791	0.3	1	117317 5474
81	119094	0.3	1	113976 5118
82	115048	0.3	1	109953 5095
83	110957	0.3	1	106119 4838
84	107760	0.3	1	102959 4801
85	105919	0.3	1	100973 4946
86	104575	0.3	1	99936 4639
87	105119	0.3	1	100537 4582
88	102473	0.3	1	97892 4581
89	90861	0.3	1	86820 4041
90	87725	0.3	1	83791 3934
91	82796	0.3	1	79052 3744
92	76768	0.3	1	73386 3382
93	74497	0.3	1	71224 3273
94	71670	0.3	1	68502 3168
95	67300	0.3	1	64357 2943
96	63751	0.3	1	61088 2663
97	58781	0.3	1	56250 2531
98	54871	0.3	1	52462 2409
99	53000	0.3	1	50773 2227
100	48407	0.3	1	46381 2026
101	45093	0.3	1	43177 1916
102	40752	0.3	1	38996 1756
103	37584	0.3	1	35906 1678
104	35749	0.3	1	34226 1523
105	31850	0.3	1	30464 1386
106	29282	0.3	1	28064 1218
107	26084	0.3	1	24956 1128
108	23726	0.3	1	22669 1057
109	21646	0.3	1	20704 942
110	19323	0.3	1	18515 808
111	18235	0.3	1	17436 799
112	15797	0.3	1	15095 702
113	13713	0.3	1	13089 624
114	11968	0.3	1	11416 552
115	10124	0.3	1	9656 468
116	8777	0.3	1	8354 423
117	7339	0.3	1	6995 344
118	6452	0.3	1	6137 315
119	5422	0.3	1	5101 321
120	4499	0.3	1	4281 218
121	3883	0.3	1	3680 203
122	3313	0.3	1	3156 157
123	2807	0.3	1	2669 138
124	2400	0.3	1	2285 115
125	2022	0.3	1	1907 115
126	1650	0.3	1	1555 95
127	1434	0.3	1	1346 88
128	1129	0.3	1	1055 74
129	969	0.3	1	902 67
130	682	0.3	1	644 38
131	587	0.3	1	552 35
132	523	0.3	1	500 23
133	524	0.3	1	494 30
134	633	0.3	1	604 29
135	337	0.3	1	318 19
136	197	0.3	1	186 11
137	147	0.3	1	133 14
138	92	0.3	1	88 4
139	79	0.3	1	71 8
140	57	0.3	1	52 5
141	37	0.3	1	35 2
142	37	0.3	1	35 2
143	25	0.3	1	20 5
144	20	0.3	1	20
145	26	0.3	1	25 1
146	24	0.3	1	23 1
147	85	0.3	1	85
148	61	0.3	1	59 2
149	39	0.3	1	36 3
150	325	0.3	1	310 15

RUN STATISTICS FOR INPUT FILE: /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz
=============================================
22159935 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Validate paired-end files 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz and 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz
file_1: 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz, file_2: 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz


>>>>> Now validing the length of the 2 paired-end infiles: 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz and 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz <<<<<
Writing validated paired-end read 1 reads to 16C_26psu_1_S13_L001_R1_001_val_1.fq.gz
Writing validated paired-end read 2 reads to 16C_26psu_1_S13_L001_R2_001_val_2.fq.gz

Total number of sequences analysed: 22159935

Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 42248 (0.19%)


  >>> Now running FastQC on the validated data 16C_26psu_1_S13_L001_R1_001_val_1.fq.gz<<<

Specified output directory '/Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC' does not exist

  >>> Now running FastQC on the validated data 16C_26psu_1_S13_L001_R2_001_val_2.fq.gz<<<

Specified output directory '/Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC' does not exist
Deleting both intermediate output files 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz and 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz

====================================================================================================
Written on August 6, 2019