関数型Python

with scheme
with python
- 華麗なるパイプ
- trans scheme to python
dict
gc

with scheme

前回は chez schemeでファイルからの入力がむずかしそうだったんで、手抜きした。だって、read,read-charの2種類しかなかったんだもの。

なにか資料がないかと探したら、 The Scheme Programming Language Fourth Edition の、例題がでてきた。幸いtreeも扱っていて、一石二鳥だ。

> (define ip (open-input-file "SEED"))
> (read ip)
1
> (read ip)
2

入力ファイルの最初と最後を ( … ) の様に括弧で囲んでしまえば

> (read (open-input-file "SEED"))
(1 2 3 4 5 6 7 8 9 10)

一つのS式になってしまうので、readで一発である。データが大きいと資源を一気に消費してしまうけど。覚えておいて損はない。

> (define zz (read (open-input-file "SEED")))
> zz
(q0.000000p q0.123456p q0.246912p q0.370368p q0.493824p
  q0.617280p q0.740736p q0.864192p q0.987648p)
> (map symbol? zz)
(#t #t #t #t #t #t #t #t #t)

πの日は既に過ぎてしまったけど、ふと思い出した事がある。パイの数字列の中には、いろいろなシーケンスが含まれているとな。PI に時間を

円周率は神の数字なんで、遠慮するとして、オイラーが人口的に作成した数列で、任意のパターンを検索してみるか。数列は、前回使ったやつ。 0から素数である997をどんどんと足していくやつね。

パターンマッチと言えば正規表現になるんで、goshでやる。すこし下調べ。

gosh$ (rxmatch (string->regexp "123") "ww123ff")
#<<regmatch> 0x7f2dbcc63810>
gosh$ (rxmatch (string->regexp "123") "wwff")
#f
gosh$ (iota 10 0 997)
(0 997 1994 2991 3988 4985 5982 6979 7976 8973)

後は info gaucheの 6.6.6 リストをたどる手続きから、便利に使える手続を選べばよい。色々あって迷ってしまうぞ。そんな訳で下記のようになった。

(define seed (map number->string
                  (iota 100000 0 997)))

(define chk
  (lambda (qn)
    (let* ((rx  (string->regexp (number->string qn)))
           (res (count (lambda (k) (rxmatch rx k)) seed)))
      (format #t "~7d ~a\n" qn res))))

(for-each chk '(8888 2828 9696 298250 12345 777))

前回までによく使ったseqの整数版で、100万個の数値を発生。それぞれの数値の中に、文字並びとして、幾つあるかチェック。検査したい並びは、独断と偏見で、中国/韓国な人が好きなパルパルパルパル、漢なオイラーはニヤニヤ、最近髪が薄いので期待をこめてクログロ、きっとニクヤニゴーすればいいかな、パスワードに出てきそうな12345、そしてやっぱりトリプルセブンを選択。

sakae@deb:/tmp$ gosh incp.scm
   8888 46
   2828 50
   9696 51
 298250 0
  12345 4
    777 557

これだと、検索語が固定なので、

gosh$ (chk (read))
33
     33 6317
#<undef>

とかね。

gosh$ (use math.prime)
gosh$ (take *primes* 200)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199
 211 223 227 229 ....)
gosh$ (take (drop *primes* 150) 20)
(877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997 1009
 1013)

997の次の素数は幾つ？ちゃんと素数列が用意されてた。なんでも揃っているなあ。

with python

上でやったのと同じ事をpythonでもやってみたい。ちょいと下調べする。

toolz

Pythonで関数型プログラミングのエッセンスに触れてみよう

関数型プログラミング用モジュール(std python)

関数型プログラミング HOWTO

Pythonリスト内包表記の使い方を推薦してる節があるな。あのhaskelさんも使ってますから。

In [10]: [str(x) for x in range(10,30,3)]
Out[10]: ['10', '13', '16', '19', '22', '25', '28']

In [16]: import toolz
In [18]: ss = [str(x) for x in range(10,30,3)]

In [19]: [x for x in toolz.take(2,ss)]
Out[19]: ['10', '13']

In [20]: toolz.pipe(toolz.take(2,ss), list)
Out[20]: ['10', '13']

華麗なるパイプ

オイラーはパイプが好きだ。toolzで提供されてる奴ね。

def pipe(data, *funcs):
    """ Pipe a value through a sequence of functions

    I.e. ``pipe(data, f, g, h)`` is equivalent to ``h(g(f(data)))``

    We think of the value as progressing through a pipe of several
    transformations, much like pipes in UNIX

    ``$ cat data | f | g | h``
    """
    for func in funcs:
        data = func(data)
    return data

漢なオイラーは、単数系と複数系の区別をよく見落す。オイラーなら、for f in funcs: と、しちゃうけど、どうよ。簡単なコードなんで、これだけを取り出して利用させて貰おう。引数の所で、*funcs と、'*' が付いているのは、リストで受け取るっていう目印だな。これだけは注意しとけよ。

import pprint as pp

def pipe(data, *funcs):
    for f in funcs:
        data = f(data)
    return data

inc = lambda x:  x + 1
bai = lambda x:  x * 2
def tee(x): pp.pprint(x); return x

pipe(5, inc, bai, str, bai, tee, int, bai)

lambdaな無名関数に名前をつけるのは、推奨しませんという事だけど、便利なのでオイラーは積極的に使うよ。teeはパイプの中を流れるデータをモニターするやつ。unixの有名なコマンドだ。うけとったデータをそのまま返す、恒等関数に、副作用のpprintを作用させてる。ただのprintを使ってしまうと、数値なのか文字列なのか区別できないので、こんな事になってる。schemeだと、人間様向けはprint、機械可読向けにwriteなんだけどね。

bai(倍)は、数値にも文字列にも適用できるっていうのが、よくわかる。

>>>
'1212'
2424

もう少し発展させたやつで、カリー化ってのがある。複数の引数を取る関数に、あらかじめ引数を与えて、宙ぶらりんの関数をつくる。最後の引数が到着したら完成(元の関数を計算できる)って状態まで用意しておければ、その関数は、一引数の関数とみなせる。そうなれば、上のパイプに組み込みも簡単にできる。

下記はそのための舞台裏を提供してる、curry関数の使用例。

>>> def mul(x, y):
...     return x * y
>>> mul = curry(mul)

>>> double = mul(2)
>>> double(10)
20

ふと、ポイントフリースタイルなんてのを思い出しちゃったぞ。

trans scheme to python

次は re — 正規表現操作だな。

>>> import re
>>> seed = '1234567'
>>> pat = re.compile('34')
>>> re.search(pat, seed)
<re.Match object; span=(2, 4), match='34'>
>>> pat = re.compile('xx')
>>> re.search(pat, seed)

先頭から調べるmatchなんてのも有るんで、混同しないように!

>>> xs = list(range(-5, 6))
>>> print(xs)
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5]
>>> print( sum((i < 0 for i in xs)) )
5

[]で囲む内包表記に対して()で囲むジェネレータ式ってのもある。内包表記は全要素を処理したリストを生成するのに対して、ジェネレータ式は要素を逐次処理していくため、メモリ効率がよいそうだ。

collections.Counterなんていうクラスも役にたちそうだな。

何だかんだと、関数言語っぽく、作ってみた。訳ワカメなフォーマット出力は止めて欲しいぞ。こんなの発明する事ないだろうに。

import re

seeds = [str(x) for x in range(0,100000000,997)]

def chk(qn):
    rx = re.compile(str(qn))
    res = [1 for s in seeds if re.search(rx, s)]
    print("{:>7}".format(qn), sum(res))

for p in [8888, 2828, 9696, 298250, 12345, 777]: chk(p)

rangeを使った数列の発生が、gauche/Schemeにあった、iotaと微妙に異なるので、発生個数が違う。

pythonならWindowsのmingw64でも動くので、久しぶりに使おうかと思ったら、 emacsからpythonを起動できなかった。forkが無いのが致命的だな。

dict

マッピング型 — dict

急にクラスなんか出てきて、目覚めたんか？

import pprint as pp

class MyClass:
    a = 33
    b = 55
    def sum(self):
       return self.a + self.b

foo = MyClass()
foo.tora = "shiba"
foo.age  = 45
foo.PI2 = 3.141593
print( foo.sum() )
print( foo.__dict__ )
pp.pprint( MyClass.__dict__ )

いや違うよ。dictは何処にも有るよって例示さ。

88
{'tora': 'shiba', 'age': 45, 'PI2': 3.141593}
mappingproxy({'__dict__': <attribute '__dict__' of 'MyClass' objects>,
              '__doc__': None,
              '__module__': '__main__',
              '__weakref__': <attribute '__weakref__' of 'MyClass' objects>,
              'a': 33,
              'b': 55,
              'sum': <function MyClass.sum at 0x2108bd60>})

インスタンスfooで定義したものは、dictに収納される。クラスもそうなんだな。まさにdictの塊さ。

Python-3.11.2/Objects/dictobject.c と Python-3.11.2/Include/cpython/dictobject.h に、しっかりと定義されてた。 Python-3.11.2/Include/internal/pycore_dict.h も関係者か。

gc

関数型となれば、gc を抜きに語れない。

gc — ガベージコレクタインターフェース

>>> import pprint as pp
>>> import gc
>>> pp.pprint(gc.get_stats())
[{'collected': 485, 'collections': 20, 'uncollectable': 0},
 {'collected': 33, 'collections': 1, 'uncollectable': 0},
 {'collected': 0, 'collections': 0, 'uncollectable': 0}]

gc.set_debug( gc.DEBUG_STATS ) を、前回のhash比べに組み込んで実行。もっと時間がかかるかと思ったら短かいな。

gc: collecting generation 2...
gc: objects in each generation: 669 4205 0
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0009s elapsed
gc: collecting generation 2...
gc: objects in each generation: 66 0 4647
gc: objects in permanent generation: 0
gc: done, 837 unreachable, 0 uncollectable, 0.0018s elapsed
:

CPython の GC チューニング

Pythonのgcモジュールでガベージコレクションを行う方法

『Python のガベージコレクション』

This year's Index

Home