I am trying to align the following 8 sequences using clustalo on ubuntu terminal using the following command. But I am not getting the right output.

clustalo -i sequence.fasta -o output.clu

My input file issequence.fastawith 8 sequences I downloaded from BLAST

My output fileoutput.clu

The output I was hoping for outputOnline.clustal

Also I would appreciate if some can tell me how to share this kind of data in a question because obviously the way I have done is doesnot seem appropriate.

You just need to specify the output format with theoutfmtflag.

clustalo -i sequence.fasta -o output.clu -outfmt=clu

should give you the desired output.

How to share data?

Try using an abbreviated example which people can copy&paste, for longer data you could use something like but the URL might expire.




&ldquoInput/output error&rdquo when accessing a directory

I want to list and remove the content of a directory on a removable hard drive. But I have experienced "Input/output error":

I was wondering what the problem is?

How can I recover or remove the directory pic and all of its content?

My OS is Ubuntu 12.04, and the removable hard drive has ntfs filesystem. Other directories not containing or inside pic on the removable hard drive are working fine.

Last part of output of dmesg after I tried to list the content of the directory:

How can I send stdout to multiple commands?

There are some commands which filter or act on input, and then pass it along as output, I think usually to stdout - but some commands will just take the stdin and do whatever they do with it, and output nothing.

I'm most familiar with OS X and so there are two that come to mind immediately are pbcopy and pbpaste - which are means of accessing the system clipboard.

Anyhow, I know that if I want to take stdout and spit the output to go to both stdout and a file then I can use the tee command. And I know a little about xargs , but I don't think that's what I'm looking for.

I want to know how I can split stdout to go between two (or more) commands. For example:

There is probably a better example than that one, but I really am interested in knowing how I can send stdout to a command that does not relay it and while keeping stdout from being "muted" - I'm not asking about how to cat a file and grep part of it and copy it to the clipboard - the specific commands are not that important.

Also - I'm not asking how to send this to a file and stdout - this may be a "duplicate" question (sorry) but I did some looking and could only find similar ones that were asking about how to split between stdout and a file - and the answers to those questions seemed to be tee , which I don't think will work for me.

Finally, you may ask "why not just make pbcopy the last thing in the pipe chain?" and my response is 1) what if I want to use it and still see the output in the console? 2) what if I want to use two commands which do not output stdout after they process the input?

Oh, and one more thing - I realize I could use tee and a named pipe ( mkfifo ) but I was hoping for a way this could be done inline, concisely, without a prior setup :)

PL2303/PL2303X USB-Serial driver

I have a USB-Serial adapter with the PL2303X chip to connect hardware to a Linux host. The device appears connected via lsusb , however I cannot send and receive data. I've tried sending/receiving data using moserial and putty when attempting to send commands, no response data is returned and I notice no changes in dmesg or /var/log/syslog .

The same USB-Serial adapter connects and works on the same Dell laptop model with Windows 10 on Windows it can receive commands and return data configured with the port settings below.

I'd like to use the following port settings, though I've tried variations of the following to no avail using moserial and putty (i.e. no parity, different baud rates, hardware/software handshaking, etc.):

Windows also works with changing the above settings (i.e. no parity, 7 data bits, lower/higher baud rate, etc.).

I need to be able to send commands and receive data similar to how the device works using Windows, preferably with the above port settings.

Any ideas on how to fix or debug this? I appreciate it.

lsusb output identifies the device as Bus 001 Device 016: ID 067b:2303 Prolific Technology, Inc. PL2303 Serial Port

I believe the adapter has the PL2303X chip instead of PL2303 (source: PL2303 & Pl2303x usb serial device).

I notice messages "not an MTP device" and "unhandled action 'bind'" in /var/log/syslog :

I found an old patch in Linux kernel module patch for Prolific PL-2303X USB-serial adapter, though it mentions the main kernel tree includes PL-2303X support starting from 2.6.8.

dmesg after plugging in device:

Also have seen the error sending break = -19 message below, but am having trouble reproducing it:

I saw this question over on stackoverflow, but I didn't like any of the answers, and it really is a question that should be here on U&L anyway.

Basically an inode is used for each file on the filesystem. So running out of inodes generally means you've got a lot of small files laying around. So the question really becomes, "what directory has a large number of files in it?"

In this case, the filesystem we care about is the root filesystem / , so we can use the following command:

This will dump a list of every directory on the filesystem prefixed with the number of files (and subdirectories) in that directory. Thus the directory with the largest number of files will be at the bottom.

In my case, this turns up the following:

So basically /var/spool/postfix/maildrop is consuming all the inodes.

*Note, this answer does have three caveats that I can think of. It does not properly handle anything with newlines in the path. I know my filesystem has no files with newlines, and since this is only being used for human consumption, the potential issue isn't worth solving and one can always replace the with and use -z options for the sort and uniq commands above as following:

Optionally you can add head -zn10 to the command to get top 10 most used inodes.

It also does not handle if the files are spread out among a large number of directories. This isn't likely though, so I consider the risk acceptable. It will also count hard links to a same file (so using only one inode) several times. Again, unlikely to give false positives*

The key reason I didn't like any of the answers on the stackoverflow answer is they all cross filesystem boundaries. Since my issue was on the root filesystem, this means it would traverse every single mounted filesystem. Throwing -xdev on the find commands wouldn't even work properly.
For example, the most upvoted answer is this one:

If we change this instead to

even though /mnt/foo is a mount, it is also a directory on the root filesystem, so it'll turn up in find . -xdev -type d , and then it'll get passed to the ls -a $i , which will dive into the mount.

The find in my answer instead lists the directory of every single file on the mount. So basically with a file structure such as:


Nanopore sequencing is a rapidly developing technology in terms of both sequencing technology and data analysis. For SV analysis, several new aligners and SV callers have been developed to leverage the long-read sequencing data. In addition, assembly-based approaches can also be used for SV identification. We have established a workflow for evaluating mappers and SV callers. We found that SV callers’ performance diverges between SV types. Therefore, our recommendations are tailored to the specific applications. For an initial analysis, we recommend minimap2 and Sniffles due to their high speed and relatively balanced performance calling both insertions and deletions. For more detailed analysis, we recommend running multiple tools and integrating their results for the best performance. When a high-quality true set can be defined, a machine learning approach, such as the one we proposed here, can be used to further improve the call set. Most analysis tools for nanopore sequencing are recently developed, and both accuracy and sensitivity can be improved. We expect resources from ONT and the nanopore sequencing community to accumulate as the technology improves and its user base grows. With more data being generated, better benchmark call sets will be available to more accurately assess the tool performance and facilitate future tool development.

3 Answers 3

What you have is a multithreaded application in which one thread appears to have hit a kernel bug.

Some analysis of the bug

You have tried to shut down the process mongod with ID 1160. The main thread with ID 1160 is in a zombie state waiting for the other threads in the process to die.

The thread ftdc with ID 1247 has hit a kernel bug at some point when calling the madvise system call which ended up in an infinite loop.

The kernel has a watchdog which noticed the stuck thread and logged a stacktrace to the kernel log. The stacktrace included the name of the thread. Because the name of the thread and the process were different in this case the connection between the two was not immediately obvious from the stacktrace.

That thread was likely stuck in that state before you even tried to shutdown mongod in the first place.

When you later ran echo l > /proc/sysrq-trigger a stacktrace for the stuck thread was logged again. The two stacktraces are entirely identical, so it may very well have been stuck in the same place all along.

Reporting the bug

What you need to do is file a bug against the kernel. Remember to include the log output from the first time the watchdog detected that the thread was stuck.

Rebooting the system

In order to get this system back into a good state you will have to reboot. And there is a significant risk that a clean shutdown won't be possible.

If you attempt a clean shutdown you may need physical access to the machine in order to reset it unless you have a way to remotely power cycle the machine.

You can attempt an unclean reboot with echo b > /proc/sysrq-trigger which is about as disruptive as yanking the power from the machine. It will avoid the scenario where an attempted clean shutdown gets stuck and you can no longer ssh to the machine.

Whatever you do expect a file system check to be needed during boot. So before attempting to shut down the machine in any way you should stop services writing important data to disk and run a sync command.

There is a risk a sync command will get stuck. However since the stacktrace of the stuck process doesn't include anything file system or I/O related I consider that risk to be minor.

There is also a risk you will need physical access to the machine to get it through the boot due to file system inconsistencies. The probability of that is however less than the probability that an attempted clean shutdown will get stuck.

Output of clustalo on (1.2.1) on ubuntu 14.04 - Biology

phylo-node: A Molecular Phylogenetic Toolkit using Node.js

Require module, for example:

ensure executables are in your $PATH

Sequence Accession Numbers are collected from the commandline separated by a space (not a comma)

Node uses NCBI e-utilities to download sequences in fastA format:

Basic usage: node app.js inputfile [list of space separated accession numbers]

Get Sequence information in ASN.1 format

Sequence Accession Numbers are collected as per fastA sequences above using the genbank_json method:

Basic usage: node app.js inputfile [list of space separated accession numbers]

Download executable files:

Basic usage: node app.js URL

Note: objects for other tools i.e. PhyML, Clustal Omega, and MUSCLE contain their own methods for downloading binaries (see below)

Basic usage:

Point browser to localhost:8080

Note: to create a JBrowse server, it should be downloaded and configured as per the developer guidelines described here

Basic usage: node app.js index-file -U fastQ-reads

Basic usage: node app.js path-to-jar input-file [insert any flags (from flags below)]

ILLUMINACLIP Cut adapter and other illumina-specific sequences from the read
SLIDINGWINDOW Perform a sliding window trimming
LEADING Cut bases off the start of a read, if below a threshold quality
TRAILING Cut bases off the end of a read, if below a threshold quality
CROP Cut the read to a specified length
HEADCROP Cut the specified number of bases from the start of the read
MINLEN Drop the read if it is below a specified length
TOPHRED33 Convert quality scores to Phred-33
TOPHRED64 Convert quality scores to Phred-64

Note: must have Java Runtime environment and Trimmomatic jar

Download PhyML using this command

Basic usage: node app.js inputfile [insert any flags (from flags below)]

-d data_type
-n nb_data_sets
-b int
-m model
-f e
-t ts/tv_ratio
-v prop_invar
-c nb_subst_cat
-a gamma
-s move
-u user_tree_file
-o 'tlr'
--n_rand_starts num
--r_seed num

Basic usage: node app.js filename [-flags (from table below)]

-p3_settings_file= file_path
-output= file_path
-error= file_path

Download muscle executable

Basic usage: node app.js inputfile [insert any flags preceeded by '-' sign and seperated by a space (from flags below)]

-diags Find diagonals (faster for similar sequences)
-html Write output in HTML format (default FASTA)
-msf Write output in GCG MSF format (default FASTA)
-clw Write output in CLUSTALW format (default FASTA)
-clwstrict As -clw, with 'CLUSTAL W (1.81)' header
-quiet Do not write progress messages to stderr

Download Clustal Omega executable

Run Clustal Omega program

Basic usage: node app.js inputfile [insert any flags preceeded by '--' sign and seperated by a space]

--full Use full distance matrix for guide-tree calculation (slow mBed is default)
--full-iter Use full distance matrix for guide-tree calculation during iteration (mBed is default)
--cluster-size Write output in GCG MSF format (default FASTA)
--use-kimura use Kimura distance correction for aligned sequences (default no)
--percent-id convert distances into percent identities (default no)
--resno in Clustal format print residue numbers (default no)
--wrap number of residues before line-wrap in output
--iter Number of (combined guide tree/HMM) iterations
--max-guidetree-iterations Maximum guide tree iterations
--max-hmm-iterations Maximum number of HMM iterations

Basic usage: node app.js inputfile [insert any flags preceeded by '-' sign and seperated by a space]

-gpo Gap open penalty (default 6.0).
-gpe Gap extension penalty (default 0.9).
-p Wu-Manber algorithm used in both distance calculation and dynamic programming
-w Wu-Manber algorithm not used at all
-f fast heuristic alignment
-q 'quiet' - no messages are sent to standard error

Basic usage: node app.js input.aln input.fasta [insert any flags from below]

-h show help
-blockonly Show only user specified blocks
-output (clustal,paml,fasta,codon) Output format, default = clustal
-nogap remove columns with gaps and inframe stop codons
-nomismatch remove mismatched codons (mismatch between pep and cDNA) from the output
-codontable 1 (default),2,3,4,5,6,9,10,11,12,13,14,15,16,21,22,23 NCBI GenBank codon table
-html HTML output (only for the web server)
-nostderr No STDERR messages (only for the web server)

Note: must have Perl installed

Basic usage: node app.js input.paml input.trees [insert any flags from below]

-reoptimise 0 (no), 1(yes), 2(set branch lengths to random values)
-kappa value for kappa
-omega Value for omega (dN/dS)
-branopt 0: fixed, 1: optimise, 2: proportional
-codonf 0: F61/F60 1: F3x4 2: F1x4
-freqtype 0, 1, 2, 3
-positive_only 0(no) or 1(yes)
-nucleof 0: none, 1: adjust by a constant N_.
-aminof 0(constant), 1, 2
-freqtype 0, 1, 2, 3
-timemem summary of real time and CPU time used 1:yes 0:no
-skipsitewise Skip sitewise estimation of omega

Basic usage: node app.js input.cnt [all parameters set by cnt file]

Basic usage: node app.js path-to-jar input-file [insert any flags (from flags below)]

-i alignment_filename
-t tree_filename (optional)
-o output_filename (optional)
-[matrix] Include matrix (Amino-acid)
-I models with a proportion of invariable sites
-G rate variation among sites and categories
-IG models with both +I and +G
-all-distributions rate variation among sites, categories and both
-ncat number of categories
-F models with empirical frequency estimation
-AIC Akaike Information Criterion
-BIC Bayesian Information Criterion
-AICC Corrected Akaike Information Criterion
-DT Decision Theory Criterion
-all 7-framework comparison table
-S Optimization strategy mode: [default: 0]
-s Tree search operation for ML search
-t1 Display best-model's newick tree
-t2 Display best-model's ASCII tree
-tc Display consensus tree with specified threshold
-threads Number of threads requested to compute
-verbose Verbose mode [default: false]

Note: must have Java Runtime environment and ProtTest3 jar

Basic usage: node app.js path-to-jar input-file -o output-file [insert any flags (from flags below)]

-a Estimate model-averaged phylogeny for each active criterion
-t Base tree for likelihood calculations (e.g., -t BIONJ)
-o outputFile
-i Include models with a proportion invariable sites
-machinesfile Gets the processors per host from a machines file
-g numberOfRateCategories
-getPhylip Converts the input file into phylip format and exits
-G threshold
-h confidenceInterval
-AIC Akaike Information Criterion
-BIC Bayesian Information Criterion
-AICc Corrected Akaike Information Criterion
-hLRT Perform hierarchical likelihood ratio tests
-DT Calculate the decision theory criterion
-f Include models with unequals base frecuencies
-H Information criterion for clustering search
-n logSuffix
-O Sets the hypothesis order for the hLRTs
-p Calculate the parameter importances
-tr Number of threads requested to compute
-v Do model averaging and parameter importances
-s Sets the number of substitution schemes
-u treefile
-uLNL Calculate delta AIC,AICc,BIC against unconstrained likelihood
-w Prints out the PAUP block
-z Strict consensus type for model-averaged phylogeny

Note: must have Java Runtime environment and jModelTest2 jar

Commands can be chained in series to pipe data between applications:

Pipes dir contains the module for piping as well as example files. To execute example:

pipe_example.js pipes the output from an NCBI fetch API call into the alignment software MUSCLE and aligns the DNA using default settings

Note: must have MUSCLE in $PATH for pipe example

phylo-node was successfully tested on:

  • Microsoft Windows 7 Enterprise ver.6.1
  • MacOSX El Capitan ver.10.11.5
  • Linux Ubuntu 64-bit ver.14.04 LTS

To ensure all developmental dependencies are installed:

Note: if you get a permission error when runnning tests you may have to chmod mocha

Skinner M.E., Uzilov A.V., Stein L.D., Mungall C.J., Holmes I.H. (2009). JBrowse: a next-generation genome browser. Genome Research, 19(9):1630-1638

Langmead B, Salzberg S. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 49(4):357-9

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170

Guindon S., Dufayard J.F., Lefort V., Anisimova M., Hordijk W., Gascuel O. (2010). New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology, 59(3):307-21

Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M and Rozen SG. (2012). Primer3 - new capabilities and interfaces. Nucleic Acids Res. 40(15):e115

Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5):1792-1797

Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, (5)113

Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 7:539

Lassmann T, Sonnhammer EL. (2005). Kalign--an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 126:298

Suyama M, Torrents D, Bork P (2006). PAL2NAL: robust conversion of protein sequence alignment into the corresponding codon alignments. Nucleic Acids Res. 34:W609-W612

Massingham T, Goldman N (2005) Detecting amino acid sites under positive selection and purifying selection. Genetics 169: 1853-1762

Yang, Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24(8):1586-91.

Yang, Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 13(5):555-6

Darriba D, Taboada GL, Doallo R, Posada D. (2011). ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics, 27:1164-1165

Darriba D, Taboada GL, Doallo R, Posada D. (2012). jModelTest 2: more models, new heuristics and parallel computing. Nature Methods 9(8), 772

All contributions are welcome.

If you have any problem or suggestion please open an issue here.

Copyright (c) 2016, dohalloran

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.


