数据科学（统计学+计算机+行业经验）

统计学家的能力——建立模型和聚合（数量不断增大的）数据
计算机科学家的能力——设计并使用算法对数据进行高效存储、分析和可视化
专业领域的能力——在细分领域中经过专业的训练，既可以提出正确的问题，又可以作出专业的解答

郁彬2020年2月发表论文《Verdical data science（靠谱的数据科学）》提出的数据科学三原则（PCS）:

可预测性(predictability)
可计算性（computability）
稳定性（stability ）

B. Yu and K. Kumbier (2020) Verdical data science PNAS. 117 (8), 3920-3929. QnAs with Bin Yu.

郁彬：统计学家，美国艺术与科学学院院士、美国国家科学院院士，加州大学伯克利分校统计系和电子工程与计算机科学系终身教授

REPL 与魔法函数

Jupyter是一个非营利组织，旨在“为数十种编程语言的交互式计算开发开源软件，开放标准和服务”。2014年由Fernando Pérez从IPython中衍生出来。jupyter-console==IPython
Jupyter主要包括三种交互式计算产品：Jupyter Notebook、JupyterHub和JupyterLab(Jupyter Notebook的下一代版本)。
Jupyter Notebook（前身是IPython Notebook）是一个基于Web的交互式计算环境，用于创建Jupyter Notebook文档(JSON文档)，由一组有序的输入/输出单元格列表构成，包含代码、文本（支持Markdown）、数学公式(Mathjax)、图表和富媒体，通常以“.ipynb”结尾扩展。

$\begin{align*} y = y(x,t) &= A e^{i\theta} \\ &= A (\cos \theta + i \sin \theta) \\ &= A (\cos(kx - \omega t) + i \sin(kx - \omega t)) \\ &= A\cos(kx - \omega t) + i A\sin(kx - \omega t) \\ &= A\cos \Big(\frac{2\pi}{\lambda}x - \frac{2\pi v}{\lambda} t \Big) + i A\sin \Big(\frac{2\pi}{\lambda}x - \frac{2\pi v}{\lambda} t \Big) \\ &= A\cos \frac{2\pi}{\lambda} (x - v t) + i A\sin \frac{2\pi}{\lambda} (x - v t) \end{align*}$

其中，$ \theta = kx - \omega t $

import altair as alt
from vega_datasets import data

source = data.cars()

brush = alt.selection(type="interval")

points = (
    alt.Chart(source)
    .mark_point()
    .encode(
        x="Horsepower:Q",
        y="Miles_per_Gallon:Q",
        color=alt.condition(brush, "Origin:N", alt.value("lightgray")),
    )
    .add_selection(brush)
)

bars = (
    alt.Chart(source)
    .mark_bar()
    .encode(y="Origin:N", color="Origin:N", x="count(Origin):Q")
    .transform_filter(brush)
)

points & bars

from ipywidgets import HBox, VBox, IntSlider, interactive_output
from IPython.display import display


a = IntSlider()
b = IntSlider()
out = interactive_output(lambda a, b: print(f"{a} * {b} = {a*b}"), {"a": a, "b": b})
display(HBox([VBox([a, b]), out]))

用符号`?`获取帮助文档

Python内置的help()函数可以获取这些信息，并且能打印输出结果。例如，如果要查看内置的sorted函数的文档，可以按照以下步骤操作：

help(sorted)

Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.

IPython引入了?符号作为获取这个文档和其他相关信息的缩写：

sorted?

# Signature: sorted(iterable, /, *, key=None, reverse=False)
# Docstring:
# Return a new list containing all items from the iterable in ascending order.

# A custom key function can be supplied to customize the sort order, and the
# reverse flag can be set to request the result in descending order.
# Type:      builtin_function_or_method

也适用于自定义函数或者其他对象！下面定义一个带有docstring的小函数：

def square(a):
    """返回a的平方"""
    return a ** 2

square?

通过符号`??`获取源代码

Jupyter提供了获取源代码的快捷方式（使用两个问号??）：

                     square??

Object `square` not found.

当查询C语言或其他编译扩展语言实现的函数时，??后缀=?后缀

sorted??

用Tab补全的方式探索模块

Jupyter支持用Tab键自动补全和探索对象、模块及命名空间的内容（下面用<TAB>来表示Tab键）

对象内容的Tab自动补全
导入时的Tab自动补全
通配符*匹配

对象内容的Tab自动补全

每一个Python对象都包含各种属性和方法。Python有一个内置的dir函数，可以返回一个属性和方法的列表。Tab自动补全接口更简便：输入这个对象的名称，再加上一个句点（.）和Tab键：

L = [1, 2, 3]

L.append

  File "<ipython-input-9-9b66bc403048>", line 1
    L.
      ^
SyntaxError: invalid syntax

为了进一步缩小整个列表，可以输入属性或方法名称的第一个或前几个字符，然后Tab键将会查找匹配的属性或方法：

L.c<TAB>  
    L.clear  L.copy   L.count

L.co<TAB>  
    L.copy   L.count

如果只有一个选项，按下Tab键将会把名称自动补全。例如，下面示例中的内容将会马上被L.count替换：

L.cou<TAB>

Python一般用前置下划线表示私有属性或方法。可以通过明确地输入一条下划线来把这些私有的属性或方法列出来：

L._<TAB>
    L.__add__           L.__gt__            L.__reduce__
    L.__class__         L.__hash__          L.__reduce_ex__

为了简洁起见，这里只展示了输出的前两行，大部分是Python特殊的双下划线方法（昵称叫作“dunder方法”）。

导入时的Tab自动补全

Tab自动补全在从包中导入对象时也非常有用。下面用这种方法来查找itertools包中以co开头的所有可导入的对象：

from itertools import co<TAB>
    combinations compress combinations_with_replacement  count

同样，你也可以用Tab自动补全查看系统中所有可导入的包：
    
import <TAB>
    
import h<TAB>

通配符*匹配

Jupyter还提供了用*符号来实现的依赖中间或者末尾几个字符查询的通配符匹配方法。

str.*find*?

假设寻找一个字符串方法，它的名称中包含find，则可以这样做：

str.*find*?
    str.find
    str.rfind

这里的*符号匹配任意字符串，包括空字符串。

在实际应用过程中，灵活的通配符对于找命令非常有用。

Jupyter shell中的快捷键

Jupyter通过GNU Readline库实现了四类shell快捷键，在IPython中提高工作效率

导航快捷键
文本输入快捷键
命令历史快捷键
其他快捷键

导航快捷键

快捷键	动作
Ctrl + a	将光标移到本行的开始处
Ctrl + e	将光标移到本行的结尾处
Ctrl + b（或左箭头键）	将光标回退一个字符
Ctrl + f（或右箭头键）	将光标前进一个字符

文本输入快捷键

快捷键	动作
Backspace键	删除前一个字符
Ctrl + d	删除下一个字符
Ctrl + k	从光标开始剪切至行的末尾
Ctrl + u	从行的开头剪切至光标
Ctrl + y	Yank（即粘贴）之前剪切的文本
Ctrl + t	Transpose（即交换）前两个字符

命令历史快捷键

Jupyter的命令历史都保存在配置文件路径下的SQLite数据库中，下面的快捷键实现历史搜索：

快捷键	动作
Ctrl + p（或向上箭头）	获取前一个历史命令
Ctrl + n（或向下箭头）	获取后一个历史命令
Ctrl + r	对历史命令的反向搜索

其他快捷键

快捷键	动作
Ctrl + l	清除终端屏幕的内容
Ctrl + c	中断当前的Python命令
Ctrl + d	退出Jupyter会话

Jupyter魔法命令

Jupyter在普通Python语法基础之上的增强功能，被称作Jupyter魔法命令，都以%符号作为前缀。这些魔法命令设计用于解决数据分析中的各种常见问题。魔法命令有两种形式：

行魔法（line magic）：以单个%字符作为前缀，作用于单行输入；
单元魔法（cell magic）：以两个%%作为前缀，作用于多行输入。

粘贴代码块：`%paste`和`%cpaste`

当你使用Jupyter解释器时，粘贴多行代码块可能会导致错误，尤其是其中包含缩进和解释符号时。Jupyter的%paste魔法函数可以解决这个包含符号的多行输入问题：

In [1]: %paste

def donothing(x):... return x
## -- End pasted text --

%paste命令同时输入并执行该代码，所以你可以看到这个函数现在被应用了：

In [2]: donothing(10)
Out[2]: 10

另外一个作用类似的命令是%cpaste。该命令打开一个交互式多行输入提示，你可以在这个提示下粘贴并执行一个或多个代码块：

In [3]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:>>> def donothing(x):
:...     return x
:--

执行外部代码：`%run`

在Jupyter会话中运行代码文件非常方便，不用在另一个新窗口中运行这些程序代码。通过%run魔法命令来实现。

假设你创建了一个myscript.py文件，该文件包含以下内容：

%%file myscript.py
def square(x):
    """求平方"""
    return x ** 2

for N in range(1, 4):
    print(N, "squared is", square(N))

Overwriting myscript.py

ls myscript.py

myscript.py

%run myscript.py

1 squared is 1
2 squared is 4
3 squared is 9

当你运行了这段代码之后，该代码中包含的所有函数都可以在Jupyter会话中使用：

square(5)

25

计算代码运行时间：`%timeit`

%timeit会自动计算接下来一行的Python语句的执行时间。

%timeit L = [n ** 2 for n in range(1000)]

238 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit会自动多次执行命令以获得更稳定的结果。

对于多行语句，可以加入第二个%符号将其转变成单元魔法，以处理多行输入。例如，下面是for循环的同等结构：

%%timeit
L = []
for n in range(1000):
    L.append(n ** 2)

321 µs ± 37.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

从以上结果可以立刻看出，列表解析式比同等的for循环结构快约20%。

%magic

魔法函数的帮助文档`?`、`%magic`和`%lsmagic`

和普通的Python函数一样，Jupyter魔法函数也有文档字符串，输入以下命令即可查询：

In [10]: %timeit?

查询魔法函数的描述以及示例，可以输入命令：

In [11]: %magic

获得所有可用魔法函数的列表，可以输入命令：

In [12]: %lsmagic

也可以自定义魔法函数，详情查看官方文档。

输入和输出历史

Jupyter在shell和Notebook中都提供了几种获得历史命令的输出方式：

Jupyter的输入（In）和输出（Out）对象
下划线快捷键和历史输出
禁止输出，末尾加分号
%history、%rerun和%save

Jupyter的输入和输出对象

Jupyter的In[1]:/Out[1]:形式，并不仅仅是好看的装饰形式，还可以获取输入和输出历史：

import math

math.sin(2)

0.9092974268256817

math.cos(2)

-0.4161468365471424

Jupyter实际上创建了叫作In和Out的Python变量，这些变量自动更新以反映命令历史：

# In

# Out

In对象是一个列表，按照顺序记录所有的命令（列表中的第一项是一个占位符，以便In[1]可以表示第一条命令）：

print(In[1])

import math

Out对象不是一个列表，而是一个字典。它将输入数字映射到相应的输出（如果有的话）：

print(Out[2])

0.9092974268256817

不是所有操作都有输出，例如import语句和print语句不会加入Out。因为print函数的返回值是None。任何返回值是None的命令都不会加到Out变量中。

如果想利用之前的结果，理解以上内容将大有用处。例如，利用之前的计算结果检查sin(2) ** 2和cos(2) ** 2的和，结果如下：

Out[2] ** 2 + Out[3] ** 2

1.0

下划线快捷键和历史输出

标准的Python shell用变量_（单下划线）可以获得前一个命令输出结果，在Jupyter中也适用：

print(_)

1.0

Jupyter可以用两条下划线获得倒数第二个历史输出，用三条下划线获得倒数第三个历史输出（跳过任何没有输出的命令）：

print(__)

['', 'import math', 'math.sin(2)', 'math.cos(2)', 'In', 'Out', 'print(In[1])', 'print(Out[2])', 'Out[2] ** 2 + Out[3] ** 2', 'print(_)', 'print(__)']

print(___)

-0.4161468365471424

另外，Out[X]可以简写成_X（即一条下划线加行号）：

Out[2]

0.9092974268256817

_2

0.9092974268256817

禁止输出

要禁止一个命令的输出，在行末尾处添加一个分号：

math.sin(2) + math.cos(2);

这个结果被计算后，输出结果既不会显示在屏幕上，也不会存储在Out中：

25 in Out

False

其他历史魔法命令

如果想一次性获取此前所有的输入历史，%history魔法命令会非常有用。在下面的示例中可以看到如何打印前4条输入命令：

%history -n 1-4

   1: %load ../README.md
   2:
from ipywidgets import HBox, VBox, IntSlider, interactive_output
from IPython.display import display


a = IntSlider()
b = IntSlider()
out = interactive_output(lambda a, b: print(f"{a} * {b} = {a*b}"), {"a": a, "b": b})
display(HBox([VBox([a, b]), out]))
   3:
m = 5.1
if m < 5:
    print(f"{m}小于5")
elif m % 2 == 1:
    print(f"{m}是奇数")
elif m % 2 == 0:
    print(f"{m}是偶数")
else:
    print("不是整数")
   4: 5.1 % 2

按照惯例，可以输入%history?来查看更多相关信息以及可用选项的详细描述。其他类似的魔法命令还有%rerun（该命令将重新执行部分历史命令）和%save（该命令将部分历史命令保存到一个文件中）。如果想获取更多相关信息，建议你使用?帮助功能（详情请参见1.2节）。

Jupyter和shell命令

在Jupyter中使用shell命令
在shell中传入或传出值
部分自动魔法

在Jupyter中使用shell命令

Jupyter终端直接执行shell命令的语法，在shell命令前加!，将不会通过Python内核运行，而是通过系统命令运行。

!ls

jupyter-python
2.data-elt
2019-01-01-kivy-perface.ipynb
2019-02-01-kivy-ch1-clock-app.ipynb
2019-03-01-kivy-ch2-paint-app.ipynb
2019-04-01-kivy-ch3-sound-recorder-for-android.ipynb
2019-05-01-kivy-ch4-chat-app.ipynb
2019-06-01-kivy-ch5-remote-desktop-app.ipynb
2019-07-01-kivy-ch6-2048-app.ipynb
2019-08-01-kivy-ch7-flappy-bird-app.ipynb
2019-09-01-kivy-ch8-shaders-app.ipynb
2019-10-01-kivy-ch9-shmup-app.ipynb
2020-02-20-test.ipynb
2020-05-08-jupyter-python.ipynb
2020-05-15-data-etl.ipynb
2020-05-22-data-viz-1.ipynb
2020-05-29-data-viz-2.ipynb
2020-06-08-data-stats.ipynb
3.data-viz
4.data-stats
README.md
kbpic
my_icons

!pwd

/Users/toddtao/Documents/air/_notebooks

!echo "数据科学"

数据科学

在shell中传入或传出值

shell命令不仅可以从Jupyter中调用，还可以和Jupyter命名空间进行交互。例如，通过一个赋值操纵符将任何shell命令的输出保存到一个Python列表：

contents = !ls  
contents

['1.jupyter和python.ipynb',
 'drewconway.png',
 'enumerate.png',
 'harry_potter',
 'HW',
 'HW1.jupyter和python.ipynb',
 'jupyter.png',
 'map.png',
 'myscript2.py',
 'myscript.py',
 'tale-of-two-cities.txt',
 'zip.png',
 '猜数字.png']

directory = !pwd 
directory

['/home/junjiet/data_science2020/1.jupyter和python']

这些结果并不以列表的形式返回，虽然可以像列表一样操作，但是这种类型还有其他功能，例如grep和fields方法以及s、n和p属性，允许你轻松地搜索、过滤和显示结果。

type(directory)

IPython.utils.text.SList

另一个方向的交互，即将Python变量传入shell，可以通过{varname}语法实现：

message = "美妙的Python"

!echo {message}

美妙的Python

变量名包含在大括号内，在shell命令中用实际的变量替代。

与shell相关的魔法命令

不能通过!cd来切换目录，原因是Notebook中的shell命令是在一个临时的shell中执行的。如果你希望以一种更持久的方式更改工作路径，可以使用%cd魔法命令：

 %cd ..

/home/junjiet/data_science2020

其实可以直接用cd实现该功能：

cd ..

/home/junjiet

这种方式称作自动魔法（automagic）函数，可以通过%automagic魔法函数进行调整，默认开启。

除了%cd，其他可用的类似shell的魔法函数还有%cat、%cp、%env、%ls、%man、%mkdir、%more、%mv、%pwd、%rm和%rmdir，默认都可以省略%符号。

错误和调试

代码开发和数据分析经常需要一些调试，Jupyter调试代码工具：

%xmode：控制打印信息进行traceback
%debug：基于ipdb（pdb增强版）专用的调试器进行调试
%run -d：交互式模式运行脚本，用next命令单步向下交互地运行代码

常用调试命令如下:

命令	描述
`list`	显示文件的当前路径
`h(elp)`	显示命令列表，或查找特定命令的帮助信息
`q(uit)`	退出调试器和程序
`c(ontinue)`	退出调试器，继续运行程序
`n(ext)`	跳到程序的下一步
`<enter>`	重复前一个命令
`p(rint)`	打印变量
`s(tep)`	步入子进程
`r(eturn)`	从子进程跳出

在调试器中使用help命令，或者查看ipdb的在线文档获取更多的相关信息

代码分析与优化

“过早优化是一切罪恶的根源。”——高德纳

Jupyter提供了很多执行这些代码计时和分析的操作函数。

%time:对单个语句的执行时间进行计时
%timeit:对单个语句的重复执行进行计时，以获得更高的准确度
%prun:利用分析器运行代码
%lprun:利用逐行分析器运行代码，需要先安装pip install line_profiler，再导入%load_ext line_profiler
%memit:测量单个语句的内存使用，需要先安装pip install memory_profiler，再导入%load_ext memory_profiler

%mprun:通过逐行的内存分析器运行代码

详情请参考Python数据科学手册第 1 章　IPython：超越 Python

Python 基础

通过“猜数字”游戏介绍Python编程基础，涉及输入输出、模块、函数、控制流、数据结构等概念。程序流程图如下：

游戏代码

# 猜数字游戏
import random

m, n = 30, 5  # 最大值和猜测次数
x = random.randrange(1, m)
name = input("Hello! 你是谁?")
print(f"欢迎你，{name}同学，我在1到{m}之间选了一个整数，共{n}次机会，你猜猜看？")
for i in range(n):
    guess = int(input(f"{name}同学猜是："))
    if guess == x:
        print(f"猜中啦，{name}同学！就是{x}，赶紧去买注大乐透吧！")
        break
    else:
        print("猜大了~", end="") if guess > x else print("猜小了~", end="")
        print(f"还有{n-i-1}次机会，再猜猜看：") if i + 1 < n else print("")
else:
    print(f">:<没猜中，我想的是{x}啊！")

Hello! 你是谁?tutu
欢迎你，tutu同学，我在1到30之间选了一个整数，共5次机会，你猜猜看？
tutu同学猜是：15
猜中啦，tutu同学！就是15，赶紧去买注大乐透吧！

import语句与random模块

第1行是Python的注释，以#开头，解释器运行时会忽略 #后面的字符

# 猜数字游戏

第2行是import语句，导入了一个random模块（module）

import random

语句（statement）是执行操作的若干指令，而表达式（expression）会计算并返回值

第4行是赋值（assignment）语句，将30、5保存在变量（variable）m、n中

m, n = 30, 5  # 最大值和猜测次数

字面值（Literals）

字面值用于表示一些内置类型的常量。

数字

整型(int)、浮点数(float)和复数(complex)，与计算器一样，输入一个表达式就会有答案

3 + 4 * 5 / 6

6.333333333333334

20 // 3, 20 % 3, 20 ** 3

(6, 2, 8000)

import sys

sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

字符串

字符串有多种形式，可以使用单引号（''），双引号（""），反斜杠 \ 表示转义字符:

print("C:\some\name")

C:\some
ame

字符串跨行连续输入。用三重引号："""...""" 或 '''...'''。字符串中的回车换行会自动包含到字符串中，如果不想包含，在行尾添加一个 \ 即可。如下例:

print("""\
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
""")

Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to

字符串可以用 + 进行连接（粘到一起），也可以用 * 进行重复

3 * "un" + "ium"

'unununium'

很长的字符串用括号包裹多个片段即可:

text = ('很长的字符串用括号'
      '包裹多个片段即可')
text

'很长的字符串用括号包裹多个片段即可'

函数

关键字def创建函数，后面函数名称和带括号的形式参数列表

第5行调用了random模块的randrange()函数（function），这个函数会生成一个随机数[1,n]范围的整数x

x = random.randrange(1, m)

# 定义 Fibonacci数列
def fib(n):
    """Print a Fibonacci series up to n."""
    a, b = 0, 1
    while a < n:
        print(a, end=" ")
        a, b = b, a + b
    print()

# 调用函数
fib(2000)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597

函数通过return语句返回值，即使没有 return 语句的函数也会返回一个值None

# 定义 Fibonacci数列
def fib_return(n):
    """Print a Fibonacci series up to n."""
    a, b = 0, 1
    fib = []
    while a < n:
        fib.append(a)
        a, b = b, a + b
    return fib

fib2000 = fib_return(2000)
fib2000

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597]

random.randrange??

输入输出

第6行input()函数获取用户输入

name = input("Hello! 你是谁?")

Hello! 你是谁?tutu

name

'tutu'

第7行print()函数打印输出，f字符串可以在{ 和 } 字符之间引用变量或表达式（Python3.6新特性，另外还有C语言%字符串与.format字符串）

print(f"欢迎你，{name}同学，我在1到{m}之间选了一个整数，共{n}次机会，你猜猜看？")

欢迎你，tutu同学，我在1到30之间选了一个整数，共5次机会，你猜猜看？

def mod(a,b):
    return a % b

import dis

dis.dis(mod)

  2           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_MODULO
              6 RETURN_VALUE

mod('hello%s','world')

'helloworld'

print(f"{1+2*3}")

7

更常见的还有文件读写操作：

with open("myscript.py", "r") as f:
    txt = f.read()
    print(txt)

def square(x):
    """求平方"""
    return x ** 2

for N in range(1, 4):
    print(N, "squared is", square(N))

with open("myscript2.py", "w") as f:
    f.write(txt)

cat myscript2.py

def square(x):
    """求平方"""
    return x ** 2

for N in range(1, 4):
    print(N, "squared is", square(N))

with open("myscript.py", "r") as f:
    txt = f.readlines()
txt

['def square(x):\n',
 '    """求平方"""\n',
 '    return x ** 2\n',
 '\n',
 'for N in range(1, 4):\n',
 '    print(N, "squared is", square(N))\n']

with open("myscript2.py", "w") as f:
    f.writelines(txt)

cat myscript2.py

def square(x):
    """求平方"""
    return x ** 2

for N in range(1, 4):
    print(N, "squared is", square(N))

控制流

for...[else]、break、contiue
if...else
while...[else]

for

第8行-17行是一块for...else循环语句，对序列或其他可迭代对象的每个元素进行迭代:

for i in range(n):
...
else:
    print(f">:<没猜中，我想的是{x}啊！")
...

else子句是可选语句，当元素被迭代结束后，else 子句才会被执行

for i in range(10):
    print(i ** 2)
else:
    print("game over")

0
1
4
9
16
25
36
49
64
81
game over

for i in range(10):
    if i > 5:
        break
    print(i ** 2)
else:
    print("game over")

0
1
4
9
16
25

for i in range(10):
    if i > 5:
        continue
    print(i ** 2)
else:
    print("game over")

0
1
4
9
16
25
game over

if...else...

第10行-13行是if...else条件语句

if guess == x:
        print(f"猜中啦，{name}同学！就是{x}，赶紧去买注大乐透吧！")
        break
    else:
        ...

m = 5.1
if m < 5:
    print(f"{m}小于5")
elif m % 2 == 1:
    print(f"{m}是奇数")
elif m % 2 == 0:
    print(f"{m}是偶数")
else:
    print("不是整数")

不是整数

5.1 % 2

1.0999999999999996

浮点数精度问题，建议参考decimal标准库

鲜为人知的Python特性 https://github.com/satwikkansal/wtfpython

while

另一种循环语句，支持else可选子句

i = 0
while i < 10:
    if i > 5:
        break
    print(i ** 2)
    i += 1
else:
    print("game over")

0
1
4
9
16
25

运算符（Operators）

+       -       *       **      /       //      %      @（Python3.5新特性）
<<      >>      &       |       ^       ~       :=（Python3.8新特性）
<       >       <=      >=      ==      !=

运算符优先级：

运算符	描述
`:=`	赋值表达式
`lambda`	lambda 表达式
`if` -- `else`	条件表达式
`or`	布尔逻辑或 OR
`and`	布尔逻辑与 AND
`not` `x`	布尔逻辑非 NOT
`in`, `not in`, `is`, `is not`, `<`, `<=`, `>`, `>=`, `!=`, `==`	比较运算，包括成员检测和标识号检测
`\|`	按位或 OR
`^`	按位异或 XOR
`&`	按位与 AND
`<<`, `>>`	移位
`+`, `-`	加和减
`*`, `@`, `/`, `//`, `%`	乘，矩阵乘，除，整除，取余
`+x`, `-x`, `~x`	正，负，按位非 NOT
`**`	乘方
`await` `x`	await 表达式
`x[index]`, `x[index:index]`, `x(arguments...)`, `x.attribute`	抽取，切片，调用，属性引用
`(expressions...)`, `[expressions...]`, `{key: value...}`, `{expressions...}`	绑定或加圆括号的表达式，列表显示，字典显示，集合显示

条件表达式也称三元运算符(ternary operator):

cmp = "2 > 1" if 2 > 1 else "2 < 1"
cmp

'2 > 1'

第14、15行均使用了条件表达式：

print("猜大了~", end="") if guess > x else print("猜小了~", end="")
print(f"还有{n-i-1}次机会，再猜猜看：") if i + 1 < n else print("")

猜小了~

数据结构

range
list(列表)
tuple(元组)
set(集合)
dict(字典)

range

第8行中有一个range()函数，返回值是一个range对象，属于Python的基本序列类型，另外两个是list和tuple。range(n)包含[0,n-1]范围内所有整数。range(start, stop[, step])

range(10)

range(0, 10)

type(range(10))

range

tuple(range(10))

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

列表list

列表是可变序列，通常用于存放同类项目的集合

使用一对方括号来表示空列表: []
使用方括号，其中的项以逗号分隔: [a], [a, b, c]
使用列表推导式: [x for x in iterable]
使用类型的构造器: list() 或 list(iterable)

harry_potter = [
    "魔法石",
    "密室",
    "阿兹卡班囚徒",
    "火焰杯",
    "凤凰社",
    "混血王子",
    "死亡圣器",
    "被诅咒的孩子",
]

harry_potter

['魔法石', '密室', '阿兹卡班囚徒', '火焰杯', '凤凰社', '混血王子', '死亡圣器', '被诅咒的孩子']

type(harry_potter)

list

len(harry_potter)

8

harry_potter.index("凤凰社")

4

harry_potter[4]

'凤凰社'

harry_potter.pop()

'被诅咒的孩子'

harry_potter

['魔法石', '密室', '阿兹卡班囚徒', '火焰杯', '凤凰社', '混血王子', '死亡圣器']

harry_potter.append("被诅咒的孩子")
harry_potter

['魔法石', '密室', '阿兹卡班囚徒', '火焰杯', '凤凰社', '混血王子', '死亡圣器', '被诅咒的孩子']

for book in harry_potter:
    print(f"《哈利·波特与{book}》")

《哈利·波特与魔法石》
《哈利·波特与密室》
《哈利·波特与阿兹卡班囚徒》
《哈利·波特与火焰杯》
《哈利·波特与凤凰社》
《哈利·波特与混血王子》
《哈利·波特与死亡圣器》
《哈利·波特与被诅咒的孩子》

[f"《哈利·波特与{book}》" for book in harry_potter]

['《哈利·波特与魔法石》',
 '《哈利·波特与密室》',
 '《哈利·波特与阿兹卡班囚徒》',
 '《哈利·波特与火焰杯》',
 '《哈利·波特与凤凰社》',
 '《哈利·波特与混血王子》',
 '《哈利·波特与死亡圣器》',
 '《哈利·波特与被诅咒的孩子》']

tuple(元组)

元组是不可变序列，通常用于储存异构数据集、同构数据集的不可变序列

使用一对圆括号来表示空元组: ()
使用一个后缀的逗号来表示单元组: a, 或 (a,)
使用以逗号分隔的多个项: a, b, c or (a, b, c)
使用内置的 tuple(): tuple() 或 tuple(iterable)

harry_potter = (
    ("哈利·波特", "魔法石"),
    ("哈利·波特", "密室"),
    ("哈利·波特", "阿兹卡班囚徒"),
    ("哈利·波特", "火焰杯"),
    ("哈利·波特", "凤凰社"),
    ("哈利·波特", "混血王子"),
    ("哈利·波特", "死亡圣器"),
    ("哈利·波特", "被诅咒的孩子"),
)

for i, book in enumerate(harry_potter):
    print(f"第{i+1}本《{'与'.join(book)}》")

第1本《哈利·波特与魔法石》
第2本《哈利·波特与密室》
第3本《哈利·波特与阿兹卡班囚徒》
第4本《哈利·波特与火焰杯》
第5本《哈利·波特与凤凰社》
第6本《哈利·波特与混血王子》
第7本《哈利·波特与死亡圣器》
第8本《哈利·波特与被诅咒的孩子》

set(集合)

由不重复哈希对象构成的无序容器。用途包括

成员检测

"哈利" in {"哈利", "波特"}

True

从序列中去除重复项

set(("哈利", "波特", "哈利", "波特"))

{'哈利', '波特'}

数学集合交、并、补等运算

harry_potter_books = set(x for y in harry_potter for x in y)
harry_potter_books

{'凤凰社', '哈利·波特', '密室', '死亡圣器', '混血王子', '火焰杯', '被诅咒的孩子', '阿兹卡班囚徒', '魔法石'}

magic, secret = set(harry_potter[0]), set(harry_potter[1])
magic, secret

({'哈利·波特', '魔法石'}, {'哈利·波特', '密室'})

magic & harry_potter_books

{'哈利·波特', '魔法石'}

harry_potter_books - magic

{'凤凰社', '密室', '死亡圣器', '混血王子', '火焰杯', '被诅咒的孩子', '阿兹卡班囚徒'}

magic | secret

{'哈利·波特', '密室', '魔法石'}

magic ^ secret

{'密室', '魔法石'}

magic.add("死亡圣器")
magic

{'哈利·波特', '死亡圣器', '魔法石'}

frozenset不可变对象，不支持add、remove、pop和update等操作

magic2 = frozenset(magic) 

magic2

frozenset({'哈利·波特', '死亡圣器', '魔法石'})

magic2.add

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-97-02238ce56a10> in <module>
----> 1 magic2.add

AttributeError: 'frozenset' object has no attribute 'add'

dict(字典)

映射类型，形式为若干键:值对。键不可重复，列表、字典或其他可变类型不可作为键。

Python3.6改写字典算法，保持顺序不变。

harry_potter1 = dict(
    魔法石=1997, 密室=1998, 阿兹卡班囚徒=1999, 火焰杯=2000, 凤凰社=2003, 混血王子=2005, 死亡圣器=2007,
)

harry_potter2 = {
    "魔法石": 1997,
    "密室": 1998,
    "阿兹卡班囚徒": 1999,
    "火焰杯": 2000,
    "凤凰社": 2003,
    "混血王子": 2005,
    "死亡圣器": 2007,
}

harry_potter1 == harry_potter2

True

harry_potter3 = dict(
    zip(
        ("魔法石", "密室", "阿兹卡班囚徒", "火焰杯", "凤凰社", "混血王子", "死亡圣器",),
        (1997, 1998, 1999, 2000, 2003, 2005, 2007,),
    )
)

harry_potter1 == harry_potter2 == harry_potter3

True

harry_potter2["凤凰社"]

2003

for book, year in harry_potter2.items():
    print(f"《哈利·波特与{book}》出版于{year}年")

《哈利·波特与魔法石》出版于1997年
《哈利·波特与密室》出版于1998年
《哈利·波特与阿兹卡班囚徒》出版于1999年
《哈利·波特与火焰杯》出版于2000年
《哈利·波特与凤凰社》出版于2003年
《哈利·波特与混血王子》出版于2005年
《哈利·波特与死亡圣器》出版于2007年

harry_potter2["被诅咒的孩子"] = 2016
harry_potter2

{'魔法石': 1997,
 '密室': 1998,
 '阿兹卡班囚徒': 1999,
 '火焰杯': 2000,
 '凤凰社': 2003,
 '混血王子': 2005,
 '死亡圣器': 2007,
 '被诅咒的孩子': 2016}

harry_potter2.pop("被诅咒的孩子")

2016

harry_potter2

{'魔法石': 1997,
 '密室': 1998,
 '阿兹卡班囚徒': 1999,
 '火焰杯': 2000,
 '凤凰社': 2003,
 '混血王子': 2005,
 '死亡圣器': 2007}

函数式编程

借助Python的不可变对象和高阶函数，进行更简洁的表达，实现快速并行计算。

可变与不可变对象
max、min、sum、any、all、sorted、reversed
filter、map、reduce
multiprocessing与concurrent.futures
第三方高阶函数包：cytoolz、fn与fancy

可变与不可变对象

Python中一切皆对象，对象有两种：可变（mutable ）与不可变（immutable）
对象被实例化（instantiate）之后获得唯一id，可变对象可以改变状态或内容，而不可变不能修改
不可变对象：int、float、bool、string/unicode、tuple*、nametuple、frozenset，读取速度快，节省内存
可变对象：list、dict、set、自定义类
tuple例外

tup = ([3, 4, 5], 'two')

tup[0][-1] = 1

tup

([3, 4, 1], 'two')

collections.nametuple

	symbol	name	current	percent	pe_ttm	current_year_percent	volume
0	SH601398	工商银行	5.17	0.39	5.855	-12.07	104063910
1	SH601939	建设银行	6.43	0.16	5.939	-11.07	51089825
2	SH600519	贵州茅台	1265.70	-0.72	36.908	6.99	2466087
3	SH601318	中国平安	74.46	0.62	10.474	-12.87	48625473
4	SH601288	农业银行	3.46	0.58	5.631	-6.23	118473509
5	SH601988	中国银行	3.48	0.58	5.420	-5.69	79563490
6	SH600036	招商银行	35.09	0.20	9.274	-6.63	71533247
7	SH601857	中国石油	4.44	1.37	42.328	-23.84	85973295
8	SH601628	中国人寿	28.54	-0.73	16.348	-18.15	16644522
9	SH600028	中国石化	4.46	0.45	23.430	-12.72	119640204

data = [{"symbol": "SH601398","name": "工商银行","current": 5.17,"percent": 0.39,"pe_ttm": 5.855,"current_year_percent": -12.07,"volume": 104063910,"industry": "银行",},
{"symbol": "SH601939","name": "建设银行","current": 6.43,"percent": 0.16,"pe_ttm": 5.939,"current_year_percent": -11.07,"volume": 51089825,"industry": "银行",},
{"symbol": "SH600519","name": "贵州茅台","current": 1265.7,"percent": -0.72,"pe_ttm": 36.908,"current_year_percent": 6.99,"volume": 2466087,"industry": "白酒",},
{"symbol": "SH601318","name": "中国平安","current": 74.46,"percent": 0.62,"pe_ttm": 10.474,"current_year_percent": -12.87,"volume": 48625473,"industry": "保险",},
{"symbol": "SH601288","name": "农业银行","current": 3.46,"percent": 0.58,"pe_ttm": 5.631,"current_year_percent": -6.23,"volume": 118473509,"industry": "银行",},
{"symbol": "SH601988","name": "中国银行","current": 3.48,"percent": 0.58,"pe_ttm": 5.42,"current_year_percent": -5.69,"volume": 79563490,"industry": "银行",},
{"symbol": "SH600036","name": "招商银行","current": 35.09,"percent": 0.2,"pe_ttm": 9.274,"current_year_percent": -6.63,"volume": 71533247,"industry": "银行",},
{"symbol": "SH601857","name": "中国石油","current": 4.44,"percent": 1.37,"pe_ttm": 42.328,"current_year_percent": -23.84,"volume": 85973295,"industry": "石化",},
{"symbol": "SH601628","name": "中国人寿","current": 28.54,"percent": -0.73,"pe_ttm": 16.348,"current_year_percent": -18.15,"volume": 16644522,"industry": "保险",},
{"symbol": "SH600028","name": "中国石化","current": 4.46,"percent": 0.45,"pe_ttm": 23.43,"current_year_percent": -12.72,"volume": 119640204,"industry": "石化",}]

from collections import namedtuple

Stocks = namedtuple(
    "Stocks",
    [
        "symbol",
        "name",
        "current",
        "percent",
        "pe_ttm",
        "current_year_percent",
        "volume",
        "industry",
    ],
)

Stocks

__main__.Stocks

icbc = Stocks(
    symbol="SH601398",
    name="工商银行",
    current=5.17,
    percent=0.39,
    pe_ttm=5.855,
    current_year_percent=-12.07,
    volume=104063910,
    industry="银行",
)

icbc

Stocks(symbol='SH601398', name='工商银行', current=5.17, percent=0.39, pe_ttm=5.855, current_year_percent=-12.07, volume=104063910, industry='银行')

icbc.name

'工商银行'

icbc.name = 'dd'

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-118-74e3a93185c6> in <module>
----> 1 icbc.name = 'dd'

AttributeError: can't set attribute

stocks = tuple(Stocks(**_) for _ in data)

stocks

(Stocks(symbol='SH601398', name='工商银行', current=5.17, percent=0.39, pe_ttm=5.855, current_year_percent=-12.07, volume=104063910, industry='银行'),
 Stocks(symbol='SH601939', name='建设银行', current=6.43, percent=0.16, pe_ttm=5.939, current_year_percent=-11.07, volume=51089825, industry='银行'),
 Stocks(symbol='SH600519', name='贵州茅台', current=1265.7, percent=-0.72, pe_ttm=36.908, current_year_percent=6.99, volume=2466087, industry='白酒'),
 Stocks(symbol='SH601318', name='中国平安', current=74.46, percent=0.62, pe_ttm=10.474, current_year_percent=-12.87, volume=48625473, industry='保险'),
 Stocks(symbol='SH601288', name='农业银行', current=3.46, percent=0.58, pe_ttm=5.631, current_year_percent=-6.23, volume=118473509, industry='银行'),
 Stocks(symbol='SH601988', name='中国银行', current=3.48, percent=0.58, pe_ttm=5.42, current_year_percent=-5.69, volume=79563490, industry='银行'),
 Stocks(symbol='SH600036', name='招商银行', current=35.09, percent=0.2, pe_ttm=9.274, current_year_percent=-6.63, volume=71533247, industry='银行'),
 Stocks(symbol='SH601857', name='中国石油', current=4.44, percent=1.37, pe_ttm=42.328, current_year_percent=-23.84, volume=85973295, industry='石化'),
 Stocks(symbol='SH601628', name='中国人寿', current=28.54, percent=-0.73, pe_ttm=16.348, current_year_percent=-18.15, volume=16644522, industry='保险'),
 Stocks(symbol='SH600028', name='中国石化', current=4.46, percent=0.45, pe_ttm=23.43, current_year_percent=-12.72, volume=119640204, industry='石化'))

stocks[0].name

'工商银行'

max、min、sum、any、all、sorted、reversed

max(stocks)

Stocks(symbol='SH601988', name='中国银行', current=3.48, percent=0.58, pe_ttm=5.42, current_year_percent=-5.69, volume=79563490, industry='银行')

max(stocks, key=lambda _: _.current)

Stocks(symbol='SH600519', name='贵州茅台', current=1265.7, percent=-0.72, pe_ttm=36.908, current_year_percent=6.99, volume=2466087, industry='白酒')

min(stocks)

Stocks(symbol='SH600028', name='中国石化', current=4.46, percent=0.45, pe_ttm=23.43, current_year_percent=-12.72, volume=119640204, industry='石化')

min(stocks, key=lambda _: _.current)

Stocks(symbol='SH601288', name='农业银行', current=3.46, percent=0.58, pe_ttm=5.631, current_year_percent=-6.23, volume=118473509, industry='银行')

sum((1, 2, 3, 0))

6

any((1, 2, 3, 0))

True

all((1, 2, 3, 0))

False

sorted(stocks, key=lambda _: _.current)

[Stocks(symbol='SH601288', name='农业银行', current=3.46, percent=0.58, pe_ttm=5.631, current_year_percent=-6.23, volume=118473509, industry='银行'),
 Stocks(symbol='SH601988', name='中国银行', current=3.48, percent=0.58, pe_ttm=5.42, current_year_percent=-5.69, volume=79563490, industry='银行'),
 Stocks(symbol='SH601857', name='中国石油', current=4.44, percent=1.37, pe_ttm=42.328, current_year_percent=-23.84, volume=85973295, industry='石化'),
 Stocks(symbol='SH600028', name='中国石化', current=4.46, percent=0.45, pe_ttm=23.43, current_year_percent=-12.72, volume=119640204, industry='石化'),
 Stocks(symbol='SH601398', name='工商银行', current=5.17, percent=0.39, pe_ttm=5.855, current_year_percent=-12.07, volume=104063910, industry='银行'),
 Stocks(symbol='SH601939', name='建设银行', current=6.43, percent=0.16, pe_ttm=5.939, current_year_percent=-11.07, volume=51089825, industry='银行'),
 Stocks(symbol='SH601628', name='中国人寿', current=28.54, percent=-0.73, pe_ttm=16.348, current_year_percent=-18.15, volume=16644522, industry='保险'),
 Stocks(symbol='SH600036', name='招商银行', current=35.09, percent=0.2, pe_ttm=9.274, current_year_percent=-6.63, volume=71533247, industry='银行'),
 Stocks(symbol='SH601318', name='中国平安', current=74.46, percent=0.62, pe_ttm=10.474, current_year_percent=-12.87, volume=48625473, industry='保险'),
 Stocks(symbol='SH600519', name='贵州茅台', current=1265.7, percent=-0.72, pe_ttm=36.908, current_year_percent=6.99, volume=2466087, industry='白酒')]

tuple(reversed(stocks))

(Stocks(symbol='SH600028', name='中国石化', current=4.46, percent=0.45, pe_ttm=23.43, current_year_percent=-12.72, volume=119640204, industry='石化'),
 Stocks(symbol='SH601628', name='中国人寿', current=28.54, percent=-0.73, pe_ttm=16.348, current_year_percent=-18.15, volume=16644522, industry='保险'),
 Stocks(symbol='SH601857', name='中国石油', current=4.44, percent=1.37, pe_ttm=42.328, current_year_percent=-23.84, volume=85973295, industry='石化'),
 Stocks(symbol='SH600036', name='招商银行', current=35.09, percent=0.2, pe_ttm=9.274, current_year_percent=-6.63, volume=71533247, industry='银行'),
 Stocks(symbol='SH601988', name='中国银行', current=3.48, percent=0.58, pe_ttm=5.42, current_year_percent=-5.69, volume=79563490, industry='银行'),
 Stocks(symbol='SH601288', name='农业银行', current=3.46, percent=0.58, pe_ttm=5.631, current_year_percent=-6.23, volume=118473509, industry='银行'),
 Stocks(symbol='SH601318', name='中国平安', current=74.46, percent=0.62, pe_ttm=10.474, current_year_percent=-12.87, volume=48625473, industry='保险'),
 Stocks(symbol='SH600519', name='贵州茅台', current=1265.7, percent=-0.72, pe_ttm=36.908, current_year_percent=6.99, volume=2466087, industry='白酒'),
 Stocks(symbol='SH601939', name='建设银行', current=6.43, percent=0.16, pe_ttm=5.939, current_year_percent=-11.07, volume=51089825, industry='银行'),
 Stocks(symbol='SH601398', name='工商银行', current=5.17, percent=0.39, pe_ttm=5.855, current_year_percent=-12.07, volume=104063910, industry='银行'))

filter、map、reduce

tuple(filter(lambda s: s.current > 5, stocks))

(Stocks(symbol='SH601398', name='工商银行', current=5.17, percent=0.39, pe_ttm=5.855, current_year_percent=-12.07, volume=104063910, industry='银行'),
 Stocks(symbol='SH601939', name='建设银行', current=6.43, percent=0.16, pe_ttm=5.939, current_year_percent=-11.07, volume=51089825, industry='银行'),
 Stocks(symbol='SH600519', name='贵州茅台', current=1265.7, percent=-0.72, pe_ttm=36.908, current_year_percent=6.99, volume=2466087, industry='白酒'),
 Stocks(symbol='SH601318', name='中国平安', current=74.46, percent=0.62, pe_ttm=10.474, current_year_percent=-12.87, volume=48625473, industry='保险'),
 Stocks(symbol='SH600036', name='招商银行', current=35.09, percent=0.2, pe_ttm=9.274, current_year_percent=-6.63, volume=71533247, industry='银行'),
 Stocks(symbol='SH601628', name='中国人寿', current=28.54, percent=-0.73, pe_ttm=16.348, current_year_percent=-18.15, volume=16644522, industry='保险'))

tuple(s for s in stocks if s.current > 5)

(Stocks(symbol='SH601398', name='工商银行', current=5.17, percent=0.39, pe_ttm=5.855, current_year_percent=-12.07, volume=104063910, industry='银行'),
 Stocks(symbol='SH601939', name='建设银行', current=6.43, percent=0.16, pe_ttm=5.939, current_year_percent=-11.07, volume=51089825, industry='银行'),
 Stocks(symbol='SH600519', name='贵州茅台', current=1265.7, percent=-0.72, pe_ttm=36.908, current_year_percent=6.99, volume=2466087, industry='白酒'),
 Stocks(symbol='SH601318', name='中国平安', current=74.46, percent=0.62, pe_ttm=10.474, current_year_percent=-12.87, volume=48625473, industry='保险'),
 Stocks(symbol='SH600036', name='招商银行', current=35.09, percent=0.2, pe_ttm=9.274, current_year_percent=-6.63, volume=71533247, industry='银行'),
 Stocks(symbol='SH601628', name='中国人寿', current=28.54, percent=-0.73, pe_ttm=16.348, current_year_percent=-18.15, volume=16644522, industry='保险'))

tuple(map(lambda s: (s.name, round(s.volume / 10000, 1)), stocks))

(('工商银行', 10406.4),
 ('建设银行', 5109.0),
 ('贵州茅台', 246.6),
 ('中国平安', 4862.5),
 ('农业银行', 11847.4),
 ('中国银行', 7956.3),
 ('招商银行', 7153.3),
 ('中国石油', 8597.3),
 ('中国人寿', 1664.5),
 ('中国石化', 11964.0))

tuple((s.name, round(s.volume / 10000, 1)) for s in stocks)

(('工商银行', 10406.4),
 ('建设银行', 5109.0),
 ('贵州茅台', 246.6),
 ('中国平安', 4862.5),
 ('农业银行', 11847.4),
 ('中国银行', 7956.3),
 ('招商银行', 7153.3),
 ('中国石油', 8597.3),
 ('中国人寿', 1664.5),
 ('中国石化', 11964.0))

from functools import reduce

reduce?

reduce(lambda acc, v: acc + v.volume, stocks, 0)

698073562

from itertools import accumulate

tuple(accumulate(s.volume for s in stocks))

(104063910,
 155153735,
 157619822,
 206245295,
 324718804,
 404282294,
 475815541,
 561788836,
 578433358,
 698073562)

groupby

def box(acc, v):
    acc[v.industry].append(v.name)
    return acc

stocks_industry = reduce(box, stocks, {"银行": [], "保险": [], "白酒": [], "石化": []})
stocks_industry

{'银行': ['工商银行', '建设银行', '农业银行', '中国银行', '招商银行'],
 '保险': ['中国平安', '中国人寿'],
 '白酒': ['贵州茅台'],
 '石化': ['中国石油', '中国石化']}

from collections import defaultdict

stocks_industry = reduce(box, stocks, defaultdict(list))
stocks_industry

defaultdict(list,
            {'银行': ['工商银行', '建设银行', '农业银行', '中国银行', '招商银行'],
             '白酒': ['贵州茅台'],
             '保险': ['中国平安', '中国人寿'],
             '石化': ['中国石油', '中国石化']})

stocks_industry = reduce(
    lambda acc, v: {**acc, **{v.industry: acc[v.industry] + [v.name]}},
    stocks,
    {"银行": [], "白酒": [], "保险": [], "石化": []},
)
stocks_industry

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-53-1a6f22b9366d> in <module>
      1 stocks_industry = reduce(
      2     lambda acc, v: {**acc, **{v.industry: acc[v.industry] + [v.name]}},
----> 3     stocks,
      4     {"银行": [], "白酒": [], "保险": [], "石化": []},
      5 )

NameError: name 'stocks' is not defined

from itertools import groupby

{
    k: [_.name for _ in s]
    for k, s in groupby(
        sorted(stocks, key=lambda s: s.industry), key=lambda s: s.industry
    )
}

{'保险': ['中国平安', '中国人寿'],
 '白酒': ['贵州茅台'],
 '石化': ['中国石油', '中国石化'],
 '银行': ['工商银行', '建设银行', '农业银行', '中国银行', '招商银行']}

multiprocessing与concurrent.futures

import time
import os
import multiprocessing

def cmpt_open(x):
    print(f"进程：{os.getpid()} 正在计算 {x.name}")
    time.sleep(1)
    rst = {"name": x.name, "open": x.current / (1 + x.percent * 0.01)}
    print(f"进程：{os.getpid()} 完成计算 {x.name}")
    return rst

%%time
result = tuple(map(cmpt_open, stocks))
result

进程：81395 正在计算 工商银行
进程：81395 完成计算 工商银行
进程：81395 正在计算 建设银行
进程：81395 完成计算 建设银行
进程：81395 正在计算 贵州茅台
进程：81395 完成计算 贵州茅台
进程：81395 正在计算 中国平安
进程：81395 完成计算 中国平安
进程：81395 正在计算 农业银行
进程：81395 完成计算 农业银行
进程：81395 正在计算 中国银行
进程：81395 完成计算 中国银行
进程：81395 正在计算 招商银行
进程：81395 完成计算 招商银行
进程：81395 正在计算 中国石油
进程：81395 完成计算 中国石油
进程：81395 正在计算 中国人寿
进程：81395 完成计算 中国人寿
进程：81395 正在计算 中国石化
进程：81395 完成计算 中国石化
CPU times: user 18.9 ms, sys: 2.09 ms, total: 21 ms
Wall time: 10 s

({'name': '工商银行', 'open': 5.149915330212172},
 {'name': '建设银行', 'open': 6.419728434504791},
 {'name': '贵州茅台', 'open': 1274.8791297340854},
 {'name': '中国平安', 'open': 74.00119260584377},
 {'name': '农业银行', 'open': 3.4400477232054083},
 {'name': '中国银行', 'open': 3.459932392125671},
 {'name': '招商银行', 'open': 35.019960079840324},
 {'name': '中国石油', 'open': 4.379994081089079},
 {'name': '中国人寿', 'open': 28.749874080789763},
 {'name': '中国石化', 'open': 4.440019910403186})

%%time
pool = multiprocessing.Pool()
result = pool.map(cmpt_open, stocks)
result

进程：47440 正在计算 建设银行
进程：47443 正在计算 农业银行
进程：47447 正在计算 招商银行
进程：47444 正在计算 中国银行
进程：47451 正在计算 中国人寿
进程：47442 正在计算 中国平安
进程：47452 正在计算 中国石化
进程：47439 正在计算 工商银行
进程：47450 正在计算 中国石油
进程：47441 正在计算 贵州茅台
进程：47440 完成计算 建设银行
进程：47443 完成计算 农业银行
进程：47447 完成计算 招商银行
进程：47442 完成计算 中国平安
进程：47439 完成计算 工商银行
进程：47444 完成计算 中国银行
进程：47451 完成计算 中国人寿
CPU times: user 40 ms, sys: 159 ms, total: 199 ms
Wall time: 1.19 s
进程：47450 完成计算 中国石油
进程：47441 完成计算 贵州茅台
进程：47452 完成计算 中国石化

[{'name': '工商银行', 'open': 5.149915330212172},
 {'name': '建设银行', 'open': 6.419728434504791},
 {'name': '贵州茅台', 'open': 1274.8791297340854},
 {'name': '中国平安', 'open': 74.00119260584377},
 {'name': '农业银行', 'open': 3.4400477232054083},
 {'name': '中国银行', 'open': 3.459932392125671},
 {'name': '招商银行', 'open': 35.019960079840324},
 {'name': '中国石油', 'open': 4.379994081089079},
 {'name': '中国人寿', 'open': 28.749874080789763},
 {'name': '中国石化', 'open': 4.440019910403186}]

import concurrent.futures

%%time
with concurrent.futures.ProcessPoolExecutor() as pool:
    result = pool.map(cmpt_open, stocks)
tuple(result)

进程：48625 正在计算 工商银行
进程：48626 正在计算 建设银行
进程：48628 正在计算 中国平安
进程：48627 正在计算 贵州茅台
进程：48631 正在计算 招商银行
进程：48629 正在计算 农业银行
进程：48634 正在计算 中国石化
进程：48630 正在计算 中国银行
进程：48633 正在计算 中国人寿
进程：48632 正在计算 中国石油
进程：48625 完成计算 工商银行
进程：48626 完成计算 建设银行
进程：48628 完成计算 中国平安
进程：48627 完成计算 贵州茅台
进程：48629 完成计算 农业银行
进程：48631 完成计算 招商银行
进程：48632 完成计算 中国石油
进程：48634 完成计算 中国石化
进程：48633 完成计算 中国人寿
进程：48630 完成计算 中国银行
CPU times: user 32.1 ms, sys: 192 ms, total: 224 ms
Wall time: 1.22 s

({'name': '工商银行', 'open': 5.149915330212172},
 {'name': '建设银行', 'open': 6.419728434504791},
 {'name': '贵州茅台', 'open': 1274.8791297340854},
 {'name': '中国平安', 'open': 74.00119260584377},
 {'name': '农业银行', 'open': 3.4400477232054083},
 {'name': '中国银行', 'open': 3.459932392125671},
 {'name': '招商银行', 'open': 35.019960079840324},
 {'name': '中国石油', 'open': 4.379994081089079},
 {'name': '中国人寿', 'open': 28.749874080789763},
 {'name': '中国石化', 'open': 4.440019910403186})

%%time
with concurrent.futures.ThreadPoolExecutor() as pool:
    result = pool.map(cmpt_open, stocks)
tuple(result)

进程：81395 正在计算 工商银行
进程：81395 正在计算 建设银行
进程：81395 正在计算 贵州茅台
进程：81395 正在计算 中国平安
进程：81395 正在计算 农业银行
进程：81395 正在计算 中国银行
进程：81395 正在计算 招商银行
进程：81395 正在计算 中国石油
进程：81395 正在计算 中国人寿进程：81395 正在计算 中国石化

进程：81395 完成计算 工商银行
进程：81395 完成计算 建设银行
进程：81395 完成计算 贵州茅台
进程：81395 完成计算 中国平安
进程：81395 完成计算 农业银行
进程：81395 完成计算 中国银行进程：81395 完成计算 招商银行

进程：81395 完成计算 中国石油
进程：81395 完成计算 中国人寿进程：81395 完成计算 中国石化

CPU times: user 19.2 ms, sys: 8.83 ms, total: 28 ms
Wall time: 1.01 s

({'name': '工商银行', 'open': 5.149915330212172},
 {'name': '建设银行', 'open': 6.419728434504791},
 {'name': '贵州茅台', 'open': 1274.8791297340854},
 {'name': '中国平安', 'open': 74.00119260584377},
 {'name': '农业银行', 'open': 3.4400477232054083},
 {'name': '中国银行', 'open': 3.459932392125671},
 {'name': '招商银行', 'open': 35.019960079840324},
 {'name': '中国石油', 'open': 4.379994081089079},
 {'name': '中国人寿', 'open': 28.749874080789763},
 {'name': '中国石化', 'open': 4.440019910403186})

cytoolz、fn 与fancy

借助第三方包实现大量高阶函数，可以增强Python函数式编程功能，包括纯函数、curry柯里化（偏函数）、lazy惰性计算、并行化...

下面以cytoolz为例演示双城记词频统计：

from cytoolz import map, concat, frequencies  # cytoolz 的 map 默认惰性计算

%%time
frequencies(
    concat(map(str.upper, open("tale-of-two-cities.txt", "r", encoding="utf-8-sig")))
)

CPU times: user 79.4 ms, sys: 2.53 ms, total: 82 ms
Wall time: 89.1 ms

{'T': 54050,
 'H': 38974,
 'E': 74839,
 ' ': 126957,
 'P': 9960,
 'R': 37212,
 'O': 46537,
 'J': 714,
 'C': 13899,
 'G': 12547,
 'U': 16738,
 'N': 42385,
 'B': 8422,
 'K': 4787,
 'F': 13563,
 'A': 48167,
 'L': 22048,
 'W': 14121,
 'I': 41016,
 'S': 37587,
 ',': 13274,
 'Y': 12185,
 'D': 28046,
 '\n': 16271,
 'M': 15296,
 'V': 5204,
 '.': 6815,
 '-': 2431,
 ':': 269,
 '1': 63,
 '9': 18,
 '4': 10,
 '[': 2,
 '#': 1,
 '8': 16,
 ']': 2,
 '2': 14,
 '0': 20,
 '7': 14,
 '3': 13,
 '*': 90,
 'X': 723,
 "'": 1269,
 'Q': 666,
 ';': 1108,
 'Z': 215,
 '(': 151,
 ')': 151,
 '"': 5681,
 '!': 955,
 '?': 913,
 '_': 182,
 'É': 2,
 '6': 9,
 '5': 13,
 '/': 24,
 '%': 1,
 '@': 2,
 '$': 2}

from collections import Counter

%%time
Counter(
    map(str.upper, open("tale-of-two-cities.txt", "r", encoding="utf-8-sig").read())
)

CPU times: user 190 ms, sys: 2.68 ms, total: 193 ms
Wall time: 190 ms

Counter({'T': 54050,
         'H': 38974,
         'E': 74839,
         ' ': 126957,
         'P': 9960,
         'R': 37212,
         'O': 46537,
         'J': 714,
         'C': 13899,
         'G': 12547,
         'U': 16738,
         'N': 42385,
         'B': 8422,
         'K': 4787,
         'F': 13563,
         'A': 48167,
         'L': 22048,
         'W': 14121,
         'I': 41016,
         'S': 37587,
         ',': 13274,
         'Y': 12185,
         'D': 28046,
         '\n': 16271,
         'M': 15296,
         'V': 5204,
         '.': 6815,
         '-': 2431,
         ':': 269,
         '1': 63,
         '9': 18,
         '4': 10,
         '[': 2,
         '#': 1,
         '8': 16,
         ']': 2,
         '2': 14,
         '0': 20,
         '7': 14,
         '3': 13,
         '*': 90,
         'X': 723,
         "'": 1269,
         'Q': 666,
         ';': 1108,
         'Z': 215,
         '(': 151,
         ')': 151,
         '"': 5681,
         '!': 955,
         '?': 913,
         '_': 182,
         'É': 2,
         '6': 9,
         '5': 13,
         '/': 24,
         '%': 1,
         '@': 2,
         '$': 2})