Data Analysis

From Bioinformatics Core Wiki
Jump to: navigation, search

Perl Scripts

UAYOR (use at your own risk)! YMMV (your mileage may vary)! (email bioinformatics {dot} core {at} ucdavis {dot} edu with bug reports)

  • .. simple report on possible quality encoding formats for a fastq file (Joe Fass)
  • .. obtain length histogram, GC-content, etc. for sequences in a fasta-format file (Brad Sickler / Joe Fass)
  • .. calculate "Nx" stat for a set of sequences in fasta format (Joe Fass)
  • .. remove newlines to put all sequence on one line following header line, for all sequences in a fasta-format file (Joe Fass)
  • .. need fastq, and you only have fasta? fake it! {Joe Fass)
  • .. reverse complement a set of fasta-format sequences (Joe Fass)
  • .. reverse complement a set of single-line fastq sequences (Joe Fass)
  • .. trim paired-end fastq files based on quality using a variety of trimming methods (Nikhil Joshi)
  • .. cut out a sub-sequence from sequences and qualities in a fasta/q-format file (Joe Fass)
  • .. Sequence qualitative analysis for fasta and fastq files (Hans Vasquez-Gross)
  • .. Convert Illumina (pipeline 1.3 and above) fastq format to Sanger fastq format (cat sequence.txt | ./ > sequence.fastq) (Joe Fass)
  • .. trim Illumina read 3' ends at the first "bad" base .. takes and produces fastq (cat sequence.fastq | ./ > sequence.trimmed.fastq) (Joe Fass)
  • .. trimming script for oneline fastq, based on Heng Li's clipping algorithm implemented in bwa (for all-bad reads, substitutes one "N") (Joe Fass)
  • .. trimming script for oneline fastq, using a sliding window; chucks reads that get trimmed too short (Joe Fass)
  • .. get subset of fastq records based on fraction or number of records desired (Nikhil Joshi)
  • .. convert Illumina's "export.txt" format into fastq (no quality conversion, so equivalent to their "sequence.txt" files) (Joe Fass)
  • .. rudimentary 3'-adapter trimming; allows 1-mismatch down to a minimum length of adapter 5'-end (Joe Fass)
  • .. generate SNP sequences in a tab-separated-value format, including flanking sequence from read consensus or reference genome when no reads mapped (Joe Fass)

Complex Data Analysis

  • Targeting data sets across a large biological spectrum from DNA, protein, to complex, family, system and population as well as dynamic features such as expression, simulation. The particular activities include data mining, statistics and functional and evolutionary analysis.
  • Building Custom tool and database and special programming/algorithm support to facilitate data analysis.
  • MPI Blast





Software Tips