前言

今天这篇和昨天一样，会调几篇感兴趣的来看。
还是由于这个人有点摸，可能看不完全部了orz。

正文

1. 8 More Python Best Practices for Writing Industry-Standard Code

A few Python best practices I learnt after entering the software industry

[理解] 原文链接：8 More Python Best Practices for Writing Industry-Standard Code by
Madhushan Buwaneswaran

主要介绍了8种Python的业界写法，可以学习一下，这种写法又被作者称为**“Industry standard code”**。这与平常只给一个人看的code不同，这种code是基于团队合作，基于team的，所以以下三点很重要：readable、reusable和modifiable。

1. Always use constants instead of a specific number

有时候我们可能会涉及一些常量，对于这些常量，我们尽可能去定义成为一个常量，而不是一个单纯的数字，例如以下是一个bad practice

weight = 9.8 * mass

上面这条python语句，毫无疑问没有任何的问题，但是如果给别人看到的话，如果那个人没有任何的domain knowledge，他将不会知道这是什么意思，我们更准确更好的应该将其定义为一个常量，方便其他人阅读。以下是一个正确的例子：

GRAVITATIONAL_ACCELERATION = 9.8
weight = GRAVITATIONAL_ACCELERATION * mass

你最好定义常量在python文件的开头，并且大写。甚至你可以定义一个constants.py文件，然后import他们。

2. Use verbs as function and method names

对于一个函数而言，尽可能去用一个动词verb，而不是一个名词noun，因为很可能你需要用一个名词变量，但如果这个时候你有一个名词的函数，那不是重复了吗？，下面是一个bad practice

def prime_factors(input_number):
	# some codes
# but if you define like this, you cannot define a variable named "prime_factors"

所以我们最好的方式是，永远去define一些函数<do>_<something>，例如，对于算质数而言，下面是很好的good practice

def calculate_prime_factors(input_number):
	# some codes

3. Define members as private or protected as per the access scope requirements

对于OOP程序而言，永远记住，所有函数，变量，类，都应该只有能力拿去他那一部分资源，所以对于某些成员变量，成员函数而言，最好就设置为private或者protected，例子如下：

_var_name = 'value'  # '_' means protected
__var_name = 'value' # '__' means private

4. Don't do import *

这看似一个很方便的行为，其实是很愚蠢的行为，这会导致两个很重要的问题：可读性变差、覆盖定义。当你直接import所有的entities的时候，你文中出现的entities将不知道是由哪一个package导入的，这会大大降低code的可读性，其次就是有的package甚至会有相同的entities，甚至与你下文中定义的entities一致，这时候会被下文的entities覆盖掉。最好的方法就是直接用什么，便导入什么，如下便是一个good practice：

import package
import package.module as short_name
from package.module import SampleClass

# Accessing the imports
instance_1 = package.module.SampleClass()
instance_2 = short_name.SampleClass()
instance_3 = SampleClass()

5. Use code formatters (or linters)

如果我说这个世界上存在Code Formatter也应该不奇怪吧？ 如文中所说，Black 和 Pylint 分别是一种formatter和linters，可以很好的控制code的格式，使你的code更加readable，modifiable，reusable。

6. Write unit tests

单元测试，是很重要的东西，虽然他很无聊+time consuming。但是毫无疑问，他很重要！Python 的built-in library unittest 是一个很好的用于单元测试的package。

7. Log the errors

毫无疑问，每个人的code都不可能是bug free的。在你经过linting/formatting、unit testing和manual testing之后。可能在生产过程中，仍旧会偶尔的fail。这是很常见的事情，这个时候你需要log你的code，例如最常用的package logging。除此之外，warning、info、debug都是一些很好用的package。

try:
    # some code that might possibly trigger an error
catch SomeError:
    logging.error("log context of the error here")

8. Generate requirements.txt file with versions

很正常而言，很多package都会一直在更新，例如tensorflow就是，然而每次大更新都会导致很多的情况不同，例如tf2.0和tf1.0 就完全不一样，tf2.0把keras包含进来了，然后tf1.0就没有。这个就要求我们对于每个project，我们需要明确他们当前的environments是什么，否则的话当我们未来重新recreate这个environment的时候，他早就不是我们当年的那个environment了，很有可能会出现一系列的bug。所以我们对此，对每一个environment，我们需要设定独立的运行环境，并且建立一个特定的requriement.txt对于这个文件而言。具体步骤如下：

Have a separate virtual environment for each project
Track the installed packages using a requirements.txt file per project
When tracking the packages track them with the exact package versions

再次而言，这很的很重要，否则你的code将毫无意义，尤其是神经网络、深度学习这样的领域，package变化很快，很多code在1年、或者2年后便无法适应当前的环境了。

对于自动而言，我们可以用以下的语句来自动建立。（注意：这样会把当前环境的所有package导入，所以每个project，一定要有一个独立的环境）

pip freeze > requirements.txt

EX1. Variable Naming Convention

上面文章其实是基于这篇文章的补充，于是我们便将这篇文章里面提及的点一并讨论，Best Practices for Writing Industry Standard Python Code 很确定的说，一个好的名字，肯定是很重要的。名字应该是有直接含义的，能让人一看就懂。（注意：尽可能不要取单独的'o'或者'l'，因为他们看起来很像0和1。以下是一个糟糕的例子，根本看不出是什么意思：

x = 'Anmol Tomar'
y, z = x.split()
print(z, y, sep=', ')

#Output: 'Tomar, Anmol'

相反的，下面这个是一个很好的例子，因为他很清晰的可以看出来，每个变量是什么意思：

name = 'Anmol Tomar'
first_name, last_name = name.split()
print(last_name, first_name, sep=', ')

#Output: 'Tomar, Anmol'

EX2. Comments in function

Comments/注释，是非常重要的东西，他对于人们理解一个程序来说有着至关重要的作用。其中注释可以分为两种 Inline和Block注释，分别如下：

a = 10 # Signle inline comment

""" Example of Multi-line comment
    Line 2 of a multiline comment
"""

一个不好的例子是，他没有任何的注释，我们将不知道他的任何意思。

``` python

def function_name(param1, param2):
    print(param1)

一个好的例子是它拥有很齐全的注释：

def function_name(param1, param2):
    """Example function with PEP 484 annotations.
    Args:
        param1: The first parameter.
        param2: The second parameter.
    Returns:
        The return value. 
        True for success, False otherwise.
    """
    print(param1)

EX3. Indentation

Indentation，这也本来就是python 的特性features之一。合理使用Indentation可以很好的美化代码。以下是一个比较糟糕的例子：

# Arguments in 2nd line are not aligned
# with arguments in the 1st line.
func = function_name(var_one, var_two,
      var_three, var_four)

# Further indentation required as indentation 
# is not distinguishable -Arguments are 
# having same indentation as print function
def function_name(
    var_one, var_two, var_three,
    var_four):
    print(var_one)

一个好的例子将会很合理的使用各种各样的Indentation，不同level的元素需要被Indentation隔开**（Indentation is set using 4 spaces/tabs per indentation level.）**，以下是一个很好的例子：

# Arguments in 2nd line are aligned with 
# the arguments in the 1st line.
func = function_name(var_one, var_two,
                     var_three, var_four)

# Add 4 spaces(an extra level of indentation)
# to distinguish arguments from the rest.
def function_name(
        var_one, var_two, var_three,
        var_four):
    print(var_one)

EX 4. Maximum Line Length

对于PEP8（code checker）而言，最大的行数不能超过79个单词，对于超过79个单词的行，我们需要用''隔开。（对于括号而言，不需要用''，他自动视为隔开）

糟糕的例子：

#having length more than 79 characters. 
joined_df = pd.merge(countries_df, countries_lat_lon, on = 'CountryCode', how = 'inner')

好的例子，隔开了

# Having length less than 79 characters - within 
# parenthesis line continuation is by default.
joined_df = pd.merge(countries_df, countries_lat_lon,
                     on = 'CountryCode', how = 'inner')

# Using \ for the line continuation
from mypackage import example1, \
                      example2, example3

EX 5. Imports on separate lines

导入package的时候，每个新的package请分为不同的行，而不要挤在一起，坏的例子

import sys, os, pandas

好的例子，没有挤在一起

import os
import sys
import pandas

EX 6. Spacing Recommendations

空格的使用是有符号限制的，而不是什么都加空格，或者什么都不加空格。

对于这类符号来说。建议加空格：

assignment (=)
augmented assignment (+=, -=, etc.)
comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not)
booleans (and, or, not)

其余的就不建议加空格，不好的例子：

i=i+1
submitted +=1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)

好的例子，只有特定的符号在左右两侧加了空格：

i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)

2. Why Not Use CNN to Extract Features?

How to find unexpected patterns in your data

[未看完] 原文链接：Why Not Use CNN to Extract Features? by
Anthony Cavin

作者介绍了异常检测（anomaly detection），去寻找data里面的不寻常的unusual patterns。此处用了Manifold learning，主要用了Auto-Encoders。

Auto-Encoders可以由两部分组成：

The encoder network: reduces a high dimensional input in a low dimensional space called latent space.
The decoder network: maps the latent space into a representation of the input pictures.

Auto-encoders可以用于：

Dimensional reduction
Image compression
Data denoising
Anomaly detection

此处，Auto-Encoders主要作为Anomaly detection的运作方式为：For the latter, classical methods focus on spotting anomalies by looking at the difference between the input and its reconstructed version. The assumption is that the auto-encoder performs well when the input is similar to the training dataset but produces high reconstruction errors around anomalies. To use this method, we train the auto-encoder with anomaly-free data and look at the difference between the input and output of the auto-encoder.