Tips and tricks in Python (part 2)
Read files
# my old method
with open(path, 'r') as f:
lines = [x.replace('\n', '') for x in f.readlines()]
# a better method
with open(path, 'r') as f:
lines = f.read().splitlines()
# perhaps it's the best method
lines = open(path, 'r').read().splitlines()
Pandas
A slow approach
def extractDataMethod(series):
dict_ = {
'open': series.date.min(),
'closed': series.date.max(),
}
return pd.Series(dict_)
resultDf = combinedDf.groupby(['itemcode', 'shopcode'], as_index = False).apply(extractDataMethod)
A faster approach
openDf = combinedDf.groupby(['itemcode', 'shopcode'])['date'].min()
closedDf = combinedDf.groupby(['itemcode', 'shopcode'])['date'].max()
openDf.name = 'open'
closedDf.name = 'closed'
resultDf = pd.concat([openDf, closedDf], axis = 1)
resultDf.reset_index(inplace = True)
Read params
Suppose we have a line read from a given file with the following convention
For each line, the key and the value are separated by a colon.
# data.txt
projectFolder:/path/to/project/
dataFolder:/path/to/data/
with open('data.txt', 'r') as f:
allLines = f.readlines()
line = allLines[0]
The task is to convert then given line to a (key, value) pair for further processing. The straight-forward way to handle it would be
line = line.replace('\n', '')
key, value = line.split(':')
I was happy with that approach until a Windows-user come by and report an error when using the program. His/her data.txt
contains
dataFolder:C:/path/to/data/
Unsurprisingly, a path can contain colons, especially in Windows. It leads to the following implementation
line = line.replace('\n', '')
firstIndexOfColon = line.find(':')
key = line[:firstIndexOfColon]
value = line[firstIndexOfColon + 1:]
Type hints in Python
from typing import Tuple, List
foo: List[int] = [1, 2, 3]
bar: Tuple[int, ...] = (1, 2, 3, 4)
ham: Tuple[int, float, str] = (1, 2.0, 'three')
# egg: Tuple[int] = (1, 2, 3, 4) # wrong
# Use Optional[] for values that could be None
qux: Optional[str] = None
qux: Optional[str] = 'None'
Working with zip files in Python
from zipfile import ZipFile
with ZipFile('/path/to/data.zip') as zipObj:
zipObj.printdir()
# zipObj.extractall()
for fileObj in zipObj.infolist():
print(fileObj, fileObj.filename)
with zipObj.open('subPath/to/file.txt', 'r') as f:
data = f.readlines()
Sorting an numpy array by its column
The question and solution. A solution to sort by multiple columns.
arr = array([
[9, 2, 3],
[4, 5, 6],
[7, 0, 5],
])
arr[arr[:, 1].argsort()] # sort by the second column
Extract single member in a set
mySet = {27}
(element, ) = mySet
element = next(iter(myset))
Verify non-empty intersections
len(set(M) & set(L)) >= 1 # naive way
not M.isdisjoint(L) # it's better for (very) small list sizes
any(x in M for x in L) # better way, where len(L) < len(M)
Itertools library
itertools.product([1, 2], repeat = 3) # 8 elements
itertools.product([1, 2], [3, 4, 5]) # 6 elements
itertools.combinations([1, 2, 3, 4, 5], r = 3) # 10 elements
itertools.permutations([1, 2, 3, 4, 5], r = 3) # 60 elements
itertools.permutations([1, 2, 3, 4, 5]) # 120 elements
itertools.chain.from_iterable([[1, 2], [3, 4, 5]]) # iterator of [1, 2, 3, 4, 5]
Environment variables
Export variables in a terminal
# export FOO=$PWD
export FOO=/path/to/folder/output
export FOO="/path/to/folder/out put" # use quotation marks if space characters are included
Make use of them inside a Python script
import os
print(*os.environ.items(), sep = '\n')
value = os.environ.get('FOO') # string
Latin characters
lowerCaseCharacters = [chr(x) for x in range(ord('a'), ord('z') + 1)]
upperCaseCharacters = [chr(x) for x in range(ord('A'), ord('Z') + 1)]
allCharacters = lowerCaseCharacters + upperCaseCharacters
Intersection among lists
def intersectionAmongLists(list_):
return set(list_[0]).intersection(*list_[1:])
Number of combinations
import math
math.comb(7, 2) # 21
math.comb(7, 8) # 0
The complex way and the intuitive way
# it took me one day to be able to write this code snippet
value = (True, False)
if all(value):
pass
elif not any(value):
pass
else:
pass
# but it took me one year to know the code below is much clearer
value = (True, False)
if value == (True, True):
pass
elif value == (False, False):
pass
else:
pass
More tricks to be appended …
Hope you enjoyed the post. Please leave a comment if you have any useful tricks in Python to share with others.