ch4 strings regexp

Upload: chitra-devi

Post on 05-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 CH4 Strings Regexp

    1/24

    Strings and RegularExpressions in PHP

  • 8/2/2019 CH4 Strings Regexp

    2/24

    January 18, 2005 UPHPU - Mac Newbold 2

    String Syntax

    Single quotes: a string

    No variable interpolation, \ is only escape code

    Double quotes: a $better string\nVariables work, standard escape codes work

    Here-doc syntax: $foo =

  • 8/2/2019 CH4 Strings Regexp

    3/24

    January 18, 2005 UPHPU - Mac Newbold 3

    String Operators

    Array-like character access:

    $str = MyBigString => $str{3} == B

    Concatenation: the dot operator This lets you join strings into . bigger ones

    Note: Avoiding embedded newlines in strings thatwrap onto multiple lines is a good idea

    Concatenating Assignment : .= $str = My name is; $str .= Mac.\n;

  • 8/2/2019 CH4 Strings Regexp

    4/24

    January 18, 2005 UPHPU - Mac Newbold 4

    Variables in Strings

    Simple string with a $var in it\n

    You can use $an_array[$var] too\n

    Sometimes you need ${curl}ies to markwhere the {$var}iable ends

    Curlies help on {$big[fancy][$stuff]} too

    Where its confusing to embed .$big[ugly][$var].iables, break it up asneeded with concatenation.

  • 8/2/2019 CH4 Strings Regexp

    5/24

    January 18, 2005 UPHPU - Mac Newbold 5

    Must-Have String Functions

    www.php.net/strings

    echo/print(print $foo)==1, echo can,$take,more than one,argument;

    Echo shortcut:

    trim, ltrim, rtrim/chop remove whitespace

    explode, implode/join $arr = explode( , List of words);

    $str = implode(,,$arr);

    http://www.php.net/stringshttp://www.php.net/strings
  • 8/2/2019 CH4 Strings Regexp

    6/24

    January 18, 2005 UPHPU - Mac Newbold 6

    Obligatory C-like Functions

    All your old favorites are in there:

    printf, sprintf, sscanf, fprintf

    strcmp, strlen, strpos, strtok

    They all do just what you expect, thoughmany of them have easier alternatives

    Gotcha: Some of them (like strpos andfriends) return boolean false, because 0 is avalid result. Always use ===false.

  • 8/2/2019 CH4 Strings Regexp

    7/24

    January 18, 2005 UPHPU - Mac Newbold 7

    Basic String Manipulation

    Any of this can be done with regularexpressions as well

    and in more complex cases, can only be donewith regular expressions

    But regular expressions are slower (more later)

    str_replace(bar,baz,foobar);

    str_repeat(1234567890,8);

  • 8/2/2019 CH4 Strings Regexp

    8/24

    January 18, 2005 UPHPU - Mac Newbold 8

    Formatting functions

    strtolower, strtoupper

    ucfirst, ucwords uppercase first char, orfirst char of each word

    wordwrap wrap text to a given width

    str_pad(tooshort,15, );

    vprintf, vfprintf, vsprintf formatted output number_format add thousands grouping

    money_format format as currency

  • 8/2/2019 CH4 Strings Regexp

    9/24

    January 18, 2005 UPHPU - Mac Newbold 9

    Special-Purpose Functions

    One of PHPs strengths is the way it catersto the common things people need

    Many string functions are specifically for usewith things like dates/times, URLs, HTML,and SQL databases

    Advice: When you need them, use them.Rolling your own doesnt usually work outthe way you plan it.

  • 8/2/2019 CH4 Strings Regexp

    10/24

    January 18, 2005 UPHPU - Mac Newbold 10

    Now for the fun stuff

    Regular Expressions

    PCRE POSIX

    Performance/Speed considerations

    Grab bag of cool string functions

  • 8/2/2019 CH4 Strings Regexp

    11/24

    January 18, 2005 UPHPU - Mac Newbold 11

    Regular Expressions

    Extremely powerful tool for patternmatching same thing used by compilersand interpreters to run your programs

    Two flavors in PHP:

    PCRE Perl-Compatible Regular Expressions

    POSIX Extended

    PCRE Advantages multiple languages,more features, faster, and binary-safe

  • 8/2/2019 CH4 Strings Regexp

    12/24

    12

    Basics of REs

    They match patterns the magic is in thepattern you tell them to match

    They have to be precise, including andexcluding exactly what you want

    People get scared of them because thedetails can be tricky

    But theyre one of the best tools you havefor doing some pretty fancy string stuff

  • 8/2/2019 CH4 Strings Regexp

    13/24

    13

    RE Patterns

    Start with strings and grouping: abc(def)Add alternative branches: abc(def|123)

    Wildcard: . matches any char but \n

    Quantifiers/Repeating: * = 0 or more, + = 1 or more, ? = 0 or 1

    {n} = n times, {n,m} = n to m times

    (abc)+(def|123)*(.{2})*At least one abc, maybe some triplets, then an

    even number of characters

  • 8/2/2019 CH4 Strings Regexp

    14/24

    14

    Character Classes and Types

    [] makes character classes

    List of characters and ranges: [a-zA-Z0-9] If you want to use -, put it at the beginning

    Escape any special chars with \ as usual

    If first char is ^, class is negated

    \d = [0-9], \D = [^0-9]

    \s = whitespace, \S = non-whitespace \w = [a-zA-Z0-9_], \W = [^a-zA-Z0-9_]

    \b = word boundaryzero-width assertion

  • 8/2/2019 CH4 Strings Regexp

    15/24

    15

    Anchors

    What if you want to force it to match only atthe beginning of the string? Or to match theentire string?

    Use an anchor!

    ^ as the first char anchors the beginning

    $ as the last char anchors the end

    (Varies slightly in multi-line mode)

  • 8/2/2019 CH4 Strings Regexp

    16/24

    16

    Greediness and Modifiers

    Regular Expressions are Greedy

    Theyll keep eating characters as long as theycan keep matching.

    Consider: vs. ]*> when matchingagainst Hi

    PCRE has modifiers: //

    /i = case insensitive/U = un-greedy

    /m = multi-line

  • 8/2/2019 CH4 Strings Regexp

    17/24

    17

    Back References

    Most commonly used in replace operations,but can be used in match patterns as well

    Parentheses not only group, but capture too

    Use \ followed by the number of the capture

    ab(.)\1(.)\2 will match abccdd or abxxyy,

    but not abcccd or abdcdc Can get tricky to count which backref goes

    where with nested parentheses

  • 8/2/2019 CH4 Strings Regexp

    18/24

    18

    Modifiers for Parentheses

    PCRE Only makes some things possiblethat otherwise couldnt be done

    Non-capturing grouping: (?: )

    Can simplify back-reference counting

    Look-ahead Assertions:

    They dont advance the matching position

    Positive: (?= ), or Negative: (?! )

    Very powerful, but not always easy tounderstand. Trial and error can be your friend!

  • 8/2/2019 CH4 Strings Regexp

    19/24

    19

    PCRE Specifics

    www.php.net/pcre

    preg_match, preg_match_all, preg_replace,preg_split, preg_grep (filter an array)

    Perl REs have a delimiter, usually /, but canbe anything:

    preg_match(/foo/,$bar);

    preg_match(%/usr/local/bin/%,$path);

    http://www.php.net/pcrehttp://www.php.net/pcre
  • 8/2/2019 CH4 Strings Regexp

    20/24

    20

    POSIX Specifics

    www.php.net/regex

    ereg, ereg_replace, split, eregi, spliti, etc.

    [Only] Advantage over PCRE: It doesntrequire the PCRE library to be installed, soits always there in any PHP installation

    Other regex engines support thisspecification, though the Perl style seems tobe more popular.

    http://www.php.net/regexhttp://www.php.net/regex
  • 8/2/2019 CH4 Strings Regexp

    21/24

    21

    Almost there

    Intro to Strings in PHP

    (Feel free to tell me how fast or slow to go)

    Functions relating to HTML, SQL, etc.

    Regular Expressions

    PCRE

    POSIX

    Performance/Speed considerations

    Grab bag of cool string functions

  • 8/2/2019 CH4 Strings Regexp

    22/24

    22

    Performance/Speed

    Rule of thumb: use the simplest functionthat will get the job done right

    strpos instead of substr

    str_replace instead of preg_replace

    And so forth

    The PHP manual online usually includes notes

    about speed differences

    PCRE is faster than POSIX Regex

  • 8/2/2019 CH4 Strings Regexp

    23/24

    23

    Grab Bag

    md5, md5_file Calculate md5 hashes

    Great for passwords in databases, etc.

    levenshtein, similar_text calculate thesimilarity of two strings

    metaphone, soundex calculate how similartwo strings sound when spoken out loud

    str_rot13 Encryption algorithm

    Protected by the DMCA

  • 8/2/2019 CH4 Strings Regexp

    24/24

    24

    Grab Bag 2

    str_shuffle words are much more fun oncetheyve been randomized

    count_chars, str_word_count statisticsabout your strings

    str_revif it doesnt make sense forward,try it backwards