Microsoft Store
 

Perl


 

Perl, also Practical Extraction and Report Language (a backronym, see below), is an interpreted procedural programming language designed by Larry Wall. Perl borrows features from C, shell scripting (sh), awk, sed, and (to a lesser extent) many other programming languages.

Language structure

Example Program

In Perl, the canonical "hello world" program is:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

#!/usr/bin/perl -w

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

print "Hello, world! ";

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The first line is the shebang, which indicates the path to the location of the interpreter in the file system. The second line prints the string 'Hello, world!' and a newline (like a person pressing 'Return' or 'Enter').

Related Topics:
Shebang - Newline

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The shebang shown in the above example is typical of a unix-like system. On Windows systems, assuming that the perl executable is in the command path, one could use the following shebang line:

Related Topics:
Shebang - Unix-like - Windows - Command path

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

#!perl

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The shebang is the most common, but not the only way of ensuring that the perl interpreter runs the program. Another way to associate the file with the interpreter in Windows would be to associate .pl file types with the Perl interpreter, which is automatically done in some installations of Perl.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Here is a one-line, throw-away Perl program that does ROT13 encoding/decoding.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

It is entered and run directly on the command line:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

perl -pe 'tr/A-Za-z/N-ZA-Mn-za-m/' < input_file > output_file

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Data types

Perl has three fundamental data types: scalars, lists, and hashes:

Related Topics:
Data type - Scalar - Lists - Hashes

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

  • A scalar is a single value (i.e. a number, string or reference)
  • A list is an ordered collection of scalars (a variable that holds a list is called an array)
  • A hash, or associative array, is a map from strings to scalars; the strings are called keys and the scalars are called values.
  • All variables are marked by a leading sigil,

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    which identifies the data type.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    The same name may be used for variables of different types,

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    without conflict.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    $foo # a scalar

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    @foo # a list

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    %foo # a hash

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Numbers are written in the usual way;

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    strings are enclosed by quotes of various kinds.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    $n = 42;

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    $name = "joe";

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    $color = 'red';

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    A list may be written by listing its elements,

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    separated by commas,

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    and enclosed by parentheses where required by operator precedence.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    @scores = (32, 45, 16, 5);

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    A hash may be initialized from a list of key/value pairs.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    %favorite = (joe => 'red',

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    sam => 'blue');

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Individual elements of a list are accessed by providing a numerical

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    index, in square brackets. Individual values in a hash are

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    accessed by providing the corresponding key, in curly braces. The

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    $ sigil identifies the accessed element as a scalar.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    $scores # an element of @scores

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    $favorite{joe} # a value in %favorite

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    The number of elements in an array can be obtained by evalulating the

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    array in scalar context.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    $count = @friends;

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    There are a few functions that operate on entire hashes.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    @names = keys %address;

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    @addresses = values %address;

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Control structures

Perl has several kinds of control structures. It has block-oriented control structures, similar to those in the C and Java programming languages. Conditions are surrounded by parentheses, and controlled blocks are surrounded by braces:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

label while ( cond ) { ... }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

label while ( cond ) { ... } continue { ... }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

label for ( init-expr ; cond-expr ; incr-expr ) { ... }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

label foreach var ( list ) { ... }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

label foreach var ( list ) { ... } continue { ... }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

if ( cond ) { ... }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

if ( cond ) { ... } else { ... }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

if ( cond ) { ... } elsif ( cond ) { ... } else { ... }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Where only a single statement is being controlled, statement modifiers provide a lighter syntax:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

statement if cond ;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

statement unless cond ;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

statement while cond ;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

statement until cond ;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

statement foreach list ;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Short-circuit logical operators are commonly used to effect control flow at the expression level:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

expr and expr

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

expr or expr

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

This is especially useful because Perl treats the flow control keywords return, redo, next and last as expressions, while many other languages consider them statements.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Perl also has two implicit looping constructs:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

results = grep { ... } list

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

results = map { ... } list

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

grep returns all elements of list for which the controlled block evaluates to true.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

map evaluates the controlled block for each element of list and returns a list of the resulting values. These constructs enable a simple functional programming style.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

There is no switch (multi-way branch) statement in Perl 5. The Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures, none entirely satisfactory.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

A very general and flexible switch statement has been designed for Perl 6. The Switch module makes most of the functionality of the Perl 6 switch available to Perl 5 programs.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Perl includes a goto label statement, but it is virtually never used. It is considered poor coding practise, the implementation is slow, and situations where a goto is called for in other languages either tend not to occur in Perl or are better handled with other control structures, such as labeled loops.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

There is also a goto &sub statement that performs a tail call.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

It terminates the current subroutine and immediately calls the specified sub.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Use of this form is culturally accepted but unusual because it is rarely needed.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Subroutines

Subroutines are defined with the sub keyword,

Related Topics:
Subroutine - Keyword

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

and invoked simply by naming them.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Subroutine definitions may appear anywhere in the program.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Parentheses are required for calls that precede the definition.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

foo();

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

sub foo { ... }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

foo;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

A list of arguments may be provided after the subroutine name.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Arguments may be scalars, lists, or hashes.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

foo $a, @b, %c;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The parameters to a subroutine need not be declared as to either number or type;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

in fact, they may vary from call to call.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Arrays are expanded to their elements,

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

hashes are expanded to a list of key/value pairs,

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

and the whole lot is passed into the subroutine as one undifferentiated list of scalars.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Whatever arguments are passed are available to the subroutine in the special array @_.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The elements of @_ are aliased to the actual arguments;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

changing an element of @_ changes the corresponding argument.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Elements of @_ may be accessed by subscripting it in the usual way.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$_, $_

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

However, the resulting code can be difficult to read,

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

and the parameters have pass-by-reference semantics,

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

which may be undesirable.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

One common idiom is to assign @_ to a list of named variables.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

($a, $b, $c) = @_;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

This effects both mnemonic parameter names and pass-by-value sematics.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Another idiom is to shift parameters off of @_.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

This is especially common when the subroutine takes only one argument.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$a = shift;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Subroutines may return values.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

return 42, $x, @y, %z;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

If the subroutine does not exit via a return statement,

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

then it returns the last expression evaluated within the subroutine body.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Arrays and hashes in the return value are expanded to lists of scalars,

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

just as they are for arguments.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The returned expression is evaluated in the calling context of the subroutine;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

this can surprise the unwary.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

sub list { (4, 5, 6) }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

sub array { @a = (4, 5, 6); @a }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$a = list; # returns 6 - last element of list

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$a = array; # returns 3 - number of elements in list

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

@a = list; # returns (4, 5, 6)

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

@a = array; # returns (4, 5, 6)

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The wantarray keyword can detect the type of context the function is called in.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

sub either { wantarray ? (1, 2) : "Oranges" }

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$a = either; # returns "Oranges"

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

@a = either; # returns (1, 2)

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Regular expressions

The Perl language includes a specialized syntax for writing regular expressions (REs), and the interpreter contains an engine for matching strings to regular expressions. The regular expression engine uses a backtracking algorithm, extending its capabilities from simple pattern matching to string capture and substitution.

Related Topics:
Regular expression - Backtracking

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The Perl regular expression syntax was originally taken from Unix Version 8 regular expressions. However, it diverged before the first release of Perl, and has since grown to include many more features.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The m// (match) operator introduces a regular expression match. (The leading m may be omitted for brevity.) In the simplest case, an expression like

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$x =~ m/abc/

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

evaluates to true iff the string $x matches the regular expression abc. Capturing a matched string can be done by surrounding part of the regular expression with parentheses and evaluating it in list context. This is more interesting for patterns that can match multiple strings:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

($matched) = $x =~ m/a(.)c/; # capture the character between 'a' and 'c'

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The s/// (substitute) operator specifies a search and replace operation:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$x =~ s/abc/aBc/; # upcase the b

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Perl regular expressions can take modifiers. These are single-letter suffixes that modify the meaning of the expression:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$x =~ m/abc/i; # case-insensitive pattern match

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$x =~ s/abc/aBc/g; # global search and replace

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Regular expressions can be dense and cryptic. This is because regular expression syntax is extremely compact, generally using single characters or character pairs to represent its operations. Perl provides relief from the problem with the /x modifer, which allows programmers to place whitespace and comments inside regular expressions:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

$x =~ m/a # match 'a'

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

. # match any character

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

c # match 'c'

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

/x;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

One common use of regular expressions is to specify delimiters for the split operator:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

@words = split m/,/, $line; # divide $line into comma-separated values

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The split operator complements string capture. String capture returns strings that match the regular expression, split returns strings that don't match the RE.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

See also Perl regular expression examples.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Database interfaces

Perl is widely favored for database applications.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Its text handling facilities are good for generating SQL queries;

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

arrays, hashes and automatic memory management make it easy to collect and process the returned data.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In early versions of Perl, database interfaces were created by relinking the interpreter with a client-side database library. This was somewhat clumsy; a particular problem was that the resulting perl executable was restricted to using just the one database interface that it was linked to. Also, relinking the interpreter was sufficiently difficult that it was only done for a few of the most important and widely used databases.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In Perl 5, database interfaces are implemented by Perl DBI modules. The DBI (Database Interface) module presents a single, database-independent interface to Perl applications, while the DBD:: (Database Driver) modules handle the details of accessing some 50 different databases. There are DBD:: drivers for most ANSI SQL databases.

Related Topics:
Perl DBI - ANSI - SQL

~ ~ ~ ~ ~ ~ ~ ~ ~ ~