Information about Pipeline (unix)
In Unix-like computer operating systems, a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) of the next one. Each connection is implemented by an anonymous pipe. Filter programs are often used in this configuration. The concept was invented by Douglas McIlroy for Unix shells and it was named by analogy to a physical pipeline.
curl "http://en.wikipedia.org/wiki/Pipeline_(Unix)" | sed 's/[^a-zA-Z ]/ /g' | tr 'A-Z ' 'a-z\n' | grep '[a-z]' | sort -u | comm -23 - /usr/dict/words
However, it's possible for the shell to perform processing directly. This construct generally looks something like:
command | while read var1 var2 ...; do
... which is referred to as a "pipemill" (since the while is "milling" over the results from the initial command.)
Example:
find / /usr /var -mount -user foo -printf "%m %p\n" | while read mode filename; do chown $NEWOWNER "$filename" chmod $MODE "$filename" done
(This example will traverse file directory trees changing the ownership of all files while preserving all permissions, including those that are often stripped off by many versions of the chown command).
There are a number of variations of the pipemill construct including:
ps lax | { read x; while read x owner pid parent x x x x x stat x; do [ "$owner"="foo" -a "$stat"="Z" ] && kill "$parent" done }
(This example kills the parent processes for zombies owned/created by the user "foo").
Here the while loop is enclosed in a command group (the braces); and preceded by a read command which effectively "throws away" the first line from the ps command. (Of course, in this particular example it would be harmless to process the header line, as it wouldn't match the "$owner"= test). Note that the other references to the "x" variable are simply being used as placeholders for "throwing away" irrelevant fields from each line.
The defining characteristics of a "pipemill" are: some command or series of commands feeds data into a pipe from which a shell while loop reads and processes it.
To avoid deadlock and exploit parallelism, the process with one or more new pipes will then, generally, call
Named pipes may also be created using
The pipeline concept and the vertical-bar notation was invented by Douglas McIlroy, one of the authors of the early command shells, after he noticed that much of the time they were processing the output of one program as the input to another. The idea was eventually ported to other operating systems, such as DOS, OS/2, Windows NT, and BeOS, often with the same notation.
The robot in the icon for Apple's Automator, which also uses a pipeline concept to chain repetitive commands together, holds a pipe as recognition of the application's Unix heritage.
This feature of Unix was borrowed by other operating systems, such as Taos and MS-DOS, and eventually became the pipes and filters design pattern of software engineering.
Example
Below is an example of a pipeline that implements a kind of spell checker for the web resource indicated by a URL. An explanation of what it does follows. (Some machines have /usr/share/dict/words instead.)curl "http://en.wikipedia.org/wiki/Pipeline_(Unix)" | sed 's/[^a-zA-Z ]/ /g' | tr 'A-Z ' 'a-z\n' | grep '[a-z]' | sort -u | comm -23 - /usr/dict/words
- First, curl obtains the HTML contents of a web page (could use wget on some systems).
- Second, sed this expression removes all characters which are not spaces or letters from the web page's content, replacing them with spaces.
- Third, tr changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line).
- Fourth, grep includes only lines that contain at least one lowercase alphabetical character (removing any blank lines).
- Fifth, sort sorts the list of 'words' into alphabetical order, and the -u switch removes duplicates.
- Finally, comm finds lines in common between two files, -23 suppresses lines unique to the second file, and those which are common to both, leaving only those which are found only in the first file named. The - in place of a filename causes comm to use its standard input (from the pipe line in this case). This results in a list of "words" (lines) which are not found in /usr/dict/words.
- The special character "|" tells the operating system to pipe the output from the previous command in the line into the next command in the line. That is, the output of the curl command is given as the input of the sed command.
- The character "\" is used to place all five lines into a single command line.
Pipelines in command line interfaces
Most Unix shells have a special syntax construct for the creation of pipelines. Typically, one simply writes the filter commands in sequence, separated by the ASCII vertical bar character "|" (which, for this reason, is often called "pipe character" by Unix users). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of buffer storage).Error stream
By default, the standard error streams ("stderr") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to the console. However, many shells have additional syntax for changing this behaviour. In the csh shell, for instance, using "|&" instead of "| " signifies that the standard error stream too should be merged with the standard output and fed to the next process. The Bourne Shell can also merge standard error, using 2>&1, as well as redirect it to a different file.Pipemill
In the most commonly used simple pipelines the shell connects a series of sub-processes via pipes, and executes external commands within each sub-process. Thus the shell itself is doing no direct processing of the data flowing through the pipeline.However, it's possible for the shell to perform processing directly. This construct generally looks something like:
command | while read var1 var2 ...; do
- process each line, using variables as parsed into $var1, $var2, etc
... which is referred to as a "pipemill" (since the while is "milling" over the results from the initial command.)
Example:
find / /usr /var -mount -user foo -printf "%m %p\n" | while read mode filename; do chown $NEWOWNER "$filename" chmod $MODE "$filename" done
(This example will traverse file directory trees changing the ownership of all files while preserving all permissions, including those that are often stripped off by many versions of the chown command).
There are a number of variations of the pipemill construct including:
ps lax | { read x; while read x owner pid parent x x x x x stat x; do [ "$owner"="foo" -a "$stat"="Z" ] && kill "$parent" done }
(This example kills the parent processes for zombies owned/created by the user "foo").
Here the while loop is enclosed in a command group (the braces); and preceded by a read command which effectively "throws away" the first line from the ps command. (Of course, in this particular example it would be harmless to process the header line, as it wouldn't match the "$owner"= test). Note that the other references to the "x" variable are simply being used as placeholders for "throwing away" irrelevant fields from each line.
The defining characteristics of a "pipemill" are: some command or series of commands feeds data into a pipe from which a shell while loop reads and processes it.
Creating pipelines programmatically
Pipelines can be created under program control. Thepipe() system call asks the operating system to construct a new anonymous pipe object.
This results in two new, opened file descriptors in the process: the read-only end of the pipe, and the write-only end.
The pipe ends appear to be normal, anonymous file descriptors, except that they have no ability to seek.
To avoid deadlock and exploit parallelism, the process with one or more new pipes will then, generally, call
fork() to create new
processes. Each process will then close the end(s) of
the pipe that it will not be using before producing or consuming any data.
Alternatively, a process might create a new thread and use the pipe to communicate between them.
Named pipes may also be created using
mkfifo() or mknod() and then presented as the input or output file to programs as they are invoked. They allow multi-path pipes to be created, and are especially effective when combined with standard error redirection, or with tee.
Implementation
In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected, and managed by the scheduler together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of buffering: a sending program may produce 5000 bytes per second, and a receiving program may only be able to accept 100 bytes per second, but no data are lost. Instead, the output of the sending program is held in a buffer, or queue. When the receiving program is ready to read data, the operating system sends it data from the buffer, then removes that data from the buffer. If the buffer fills up, the sending program is suspended (blocked) until the receiving program has had a chance to read some data and make room in the buffer.Network pipes
Tools like netcat and socat can connect pipes to TCP/IP sockets, following the Unix philosophy of "everything is a file".History
The robot in the icon for Apple's Automator, which also uses a pipeline concept to chain repetitive commands together, holds a pipe as recognition of the application's Unix heritage.
Other operating systems
This feature of Unix was borrowed by other operating systems, such as Taos and MS-DOS, and eventually became the pipes and filters design pattern of software engineering.
See also
- Tee (Unix) for fitting together two pipes
- Pipeline (software) for the general software engineering concept.
- Pipeline (computer) for other computer-related pipelines.
- Hartmann pipeline
- Anonymous pipe a FIFO structure used for interprocess communication
- Named pipe persistent pipes used for interprocess communication
- XML pipeline for processing of XML files
External links
- Pipes: A Brief Introduction by The Linux Information Project (LINFO)
- http://www.softpanorama.org/Scripting/pipes.shtml Unix Pipes -- powerful and elegant programming paradigm (Softpanorama)
- shows how to use pipelines composed of simple filters to do complex data analysis.
- stdio buffering
- Use And Abuse Of Pipes With Audio Data gives an introduction to using and abusing pipes with netcat, nettee and fifos to play audio across a network.
- hitting the pipe A program that forks two processes that communicate with each other using pipes.
References
- Sal Soghoian on MacBreak Episode 5 "Enter the Automatrix"
Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification.
..... Click the link for more information.
..... Click the link for more information.
An operating system (OS) is the software that manages the sharing of the resources of a computer. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the
..... Click the link for more information.
..... Click the link for more information.
pipeline consists of a chain of processing elements (processes, threads, coroutines, etc.), arranged so that the output of each element is the input of the next. Usually some amount of buffering is provided between consecutive elements.
..... Click the link for more information.
..... Click the link for more information.
In computing, a process is an instance of a computer program that is being sequentially executed.[1] While a program itself is just a passive collection of instructions, a process is the actual execution of those instructions.
..... Click the link for more information.
..... Click the link for more information.
The standard streams are preconnected input and output channels between a computer program and its environment (typically a text terminal) when it begins execution. These standard connections are provided in Unix and Unix-like operating systems, C and C++ runtime environments, and
..... Click the link for more information.
..... Click the link for more information.
The standard streams are preconnected input and output channels between a computer program and its environment (typically a text terminal) when it begins execution. These standard connections are provided in Unix and Unix-like operating systems, C and C++ runtime environments, and
..... Click the link for more information.
..... Click the link for more information.
The standard streams are preconnected input and output channels between a computer program and its environment (typically a text terminal) when it begins execution. These standard connections are provided in Unix and Unix-like operating systems, C and C++ runtime environments, and
..... Click the link for more information.
..... Click the link for more information.
In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication. An implementation is often integrated into the operating system's file IO subsystem.
..... Click the link for more information.
..... Click the link for more information.
In UNIX and UNIX-like operating systems, a filter is program that gets most of its data from standard input (the main input stream) and writes its main results to standard output (the main output stream). UNIX filters are often used as elements of pipelines.
..... Click the link for more information.
..... Click the link for more information.
Malcolm Douglas McIlroy is a mathematician, engineer, and programmer. As of 2007 he is an Adjunct Professor of Computer Science at Dartmouth College. Dr. McIlroy is best known for having originally developed the Unix pipeline implementation, software componentry and several Unix
..... Click the link for more information.
..... Click the link for more information.
Unix shell, also called "the command line", provides the traditional user interface for the Unix operating system and for Unix-like systems. Users direct the operation of the computer by entering command input as text for a shell to execute.
..... Click the link for more information.
..... Click the link for more information.
Pipeline transport is a transportation of goods through a pipe. Most commonly, liquid and gases are sent, but pneumatic tubes that transport solid capsules using compressed air have also been used.
..... Click the link for more information.
..... Click the link for more information.
In computing, a spell checker is a software program designed to verify the spelling of words. A spell checker helps a user to ensure correct spelling, while suggesting corrections for wrongly spelled words.
..... Click the link for more information.
..... Click the link for more information.
World Wide Web (commonly shortened to the Web) is a system of interlinked, hypertext documents accessed via the Internet. With a web browser, a user views web pages that may contain text, images, videos, and other multimedia and navigates between them using hyperlinks.
..... Click the link for more information.
..... Click the link for more information.
Uniform Resource Locator (URL) formerly known as Universal Resource Locator, is a technical, Web-related term used in two distinct meanings:
..... Click the link for more information.
- In popular usage, many technical documents, it is a synonym for Uniform Resource Identifier (URI);
..... Click the link for more information.
cURL is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, TFTP, SCP, SFTP, Telnet, DICT, and LDAP. cURL supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading, Kerberos, HTTP form based upload, proxies, cookies, user+password
..... Click the link for more information.
..... Click the link for more information.
sed (Stream EDitor) refers to a Unix utility for parsing text files and the programming language it uses to apply textual transformations to a sequential stream of data.
..... Click the link for more information.
..... Click the link for more information.
tr (abbreviated from translate or transliterate) is a command in Unix-like operating systems.
When executed, the program reads from the standard input and writes to the standard output.
..... Click the link for more information.
When executed, the program reads from the standard input and writes to the standard output.
..... Click the link for more information.
grep is a command line utility that was originally written for use with the Unix operating system. Given a list of files or standard input to read, grep searches for lines of text that match one or many regular expressions, and outputs only the matching lines.
..... Click the link for more information.
..... Click the link for more information.
sort is a standard Unix command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. The -r flag will reverse the sort order.
..... Click the link for more information.
..... Click the link for more information.
The comm command in Unix is a utility that is used to compare two files. It shows common lines in one column and differing lines in separate columns for left and right files. This functionally is similar to diff.
..... Click the link for more information.
..... Click the link for more information.
HTML (Hypertext Markup Language)
File extension:
MIME type:
Type code: TEXT
..... Click the link for more information.
File extension:
.html, .htmMIME type:
text/htmlType code: TEXT
..... Click the link for more information.
Collation is the assembly of written information into a standard order. This is commonly called alphabetisation, though collation is not limited to ordering letters of the alphabet.
..... Click the link for more information.
..... Click the link for more information.
command line interface or CLI is a method of interacting with an operating system or software using a command line interpreter. This command line interpreter may be a text terminal, terminal emulator, or remote shell client such as PuTTY.
..... Click the link for more information.
..... Click the link for more information.
Unix shell, also called "the command line", provides the traditional user interface for the Unix operating system and for Unix-like systems. Users direct the operation of the computer by entering command input as text for a shell to execute.
..... Click the link for more information.
..... Click the link for more information.
American Standard Code for Information Interchange (ASCII), generally pronounced ask-ee IPA: /ˈæski/ ( [1] ), is a character encoding based on the English alphabet.
..... Click the link for more information.
..... Click the link for more information.
|) has various names that refer to differing, yet sometimes related semantics: One of the more popular names is the Sheffer stroke, though often referred to as a pipe (by the Unix community) and Vertical bar, verti-bar, vertical line or
..... Click the link for more information.
..... Click the link for more information.
In computing, a buffer is a region of memory used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device (such as a keyboard) or just before it is sent to an output device (such
..... Click the link for more information.
..... Click the link for more information.
The standard streams are preconnected input and output channels between a computer program and its environment (typically a text terminal) when it begins execution. These standard connections are provided in Unix and Unix-like operating systems, C and C++ runtime environments, and
..... Click the link for more information.
..... Click the link for more information.
The system console, root console or simply console is the text entry and display device for system administration messages, particularly those from the BIOS or boot loader, the kernel, from the init system and from the system logger.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus