ruby 2.0.0 での正規表現の新機能

13
Regexp.new('2.0.0') Ruby 2.0.0 での正規表現の新機能 西山和広 日本Rubyの会 Powered by Rabbit 2.0.8

Upload: kazuhiro-nishiyama

Post on 20-Aug-2015

1.141 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ruby 2.0.0 での正規表現の新機能

Regexp.new('2.0.0')Ruby 2.0.0 での正規表現の新機能

西山和広

日本Rubyの会

Powered by Rabbit 2.0.8

Page 2: Ruby 2.0.0 での正規表現の新機能

OnigmoOnigmo (Oniguruma-mod)

NEWS of Ruby 2.0.0 says following only:

Merge Onigmo

https://github.com/k-takata/Onigmo

Details are unknown詳細不明

1/12

Page 3: Ruby 2.0.0 での正規表現の新機能

New feature (1) \Kexamples without /\K/

"foobar".sub(/(?<=foo)bar/, "") #=> "foo""foobar".sub(/(?<=fo*)bar/, "") # SyntaxError: invalid pattern in look-behind: /(?<=fo*)bar/

examples with /\K/

"foobar".sub(/foo\Kbar/, "") #=> "foo""foobar".sub(/fo*\Kbar/, "") #=> "foo"

2/12

Page 4: Ruby 2.0.0 での正規表現の新機能

New feature (1) \KTreat the first non-blank character of the line.

examples with /\K/

gsub(/^ *\K(\d+)/) { $1.to_i+1 }

examples without /\K/

gsub(/^( *)(\d+)/) { "#{$1}#{$2.to_i+1}" }

3/12

Page 5: Ruby 2.0.0 での正規表現の新機能

New feature (2) \RLinebreak改行文字

Unicode:

(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])

Not Unicode:

(?>\x0D\x0A|[\x0A-\x0D])

4/12

Page 6: Ruby 2.0.0 での正規表現の新機能

New feature (3) \XeXtended grapheme cluster拡張書記素クラスタ

Unicode:

(?>\P{M}\p{M}*)

Not Unicode:

(?m:.)

5/12

Page 7: Ruby 2.0.0 での正規表現の新機能

Extended grapheme clusterexample:

"\u{304B 3099}"[/\X/].size #=> 2

U+304B HIRAGANA LETTER KA

U+3099 COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK

see [UAX #29] for more detail(Unicode標準附属書29)

6/12

Page 8: Ruby 2.0.0 での正規表現の新機能

New feature (4)conditional expression:

(?(cond)yes)(?(cond)yes|no)

example:

" :f o o "[/:(['"])?(?(1)[\w\s]+\1|\w+)/] #=> ":f"":'f o o'"[/:(['"])?(?(1)[\w\s]+\1|\w+)/] #=> ":'f o o'"

7/12

Page 9: Ruby 2.0.0 での正規表現の新機能

(?adu)character set option (character range option)文字集合オプション (文字範囲オプション)

d: Default (compatible with Ruby 1.9.3)

a: ASCII

u: Unicode

see doc/RE in Onigmo for more detail

8/12

Page 10: Ruby 2.0.0 での正規表現の新機能

(?adu)examples:

"\u{3042}"[/\w/] #=> nil"\u{3042}"[/(?a)\w/] #=> nil"\u{3042}"[/(?d)\w/] #=> nil"\u{3042}"[/(?u)\w/] #=> "あ"

/a\b/ =~ "a\u{3042}" #=> nil /(?a)a\b/ =~ "a\u{3042}" #=> 0 /(?d)a\b/ =~ "a\u{3042}" #=> nil /(?u)a\b/ =~ "a\u{3042}" #=> nil

9/12

Page 11: Ruby 2.0.0 での正規表現の新機能

(?adu)(?-a), (?-d), (?-u) do not found

unlike (?-i), (?-m), (?-x)

10/12

Page 12: Ruby 2.0.0 での正規表現の新機能

Character Propertysupport for Unicode blocks

example:

/\p{InHiragana}/ =~ "\u3042" #=> 0/\p{InCJKUnifiedIdeographs}/ =~ "\u3042" #=> nil

see tool/enc-unicode.rb in Onigmo for more detail

11/12

Page 13: Ruby 2.0.0 での正規表現の新機能

/\z/