PythonWise: python

Showing posts with label python. Show all posts

Thursday, July 09, 2020

Using module dir and getattr for configuration

PEP 562 added support for module level __dir__ and __getitem__.

__dir__ is called when the built-in dir function is called on the module
__getattr__ is called when an attribute is not found via the regular attribute lookup

Let's use this to build an environment based configuration module.

Conviruation values has a value, environment key and a function to convert from str to right type

I'm going to use dataclasses and populate values from environment in __post_init__
Complex data types (such as list) should be JSON encoded in the environment variables

All configuration values with start with the c_ prefix
__dir__ will return a list configuration variables without the c_ prefix
__getattr__ will add the c_ prefix and will look for the varialbes in globals

We're adding c_ prefix and removing it to bypass the regular attribute lookup mechanism. If we'll call a variable http_port and user will write config.http_port, our __dir__ function won't be called.

Here's the code

Thursday, March 05, 2020

Using getattr for nicer configuration API

Typically, you'll read configuration from files (such as YAML) and get them as a dictionary. However in Python you'd like to write config.httpd.port and not config['httpd']['port']

__getattr__ is a hook method that's called by Python when regular attribute lookup fails (not to be confused with the lower level __getattribute__, which is much harder to work with). You can use it to wrap the configuration dictionary. Here's a small example.

Wednesday, March 20, 2019

Speed: Default value vs checking for None

Python's dict has a get method. It'll either return an existing value for a given key or return a default value if the key is not in the dict. It's very tempting to write code like val = d.get(key, Object()), however you need to think about the performance implications. Since function arguments are evaluated before calling the function, this means the a new Object will be created regardless if it's in the dict or not. Let's see this affects performance.

get_default will create new Point every time and get_none will create only if there's no such object, it works since or evaluate it's arguments lazily and will stop once the first one is True.

First we'll try with a missing key:

In [1]: %run default_vs_none.py
In [2]: locations = {} # name -> Location
In [3]: %timeit get_default(locations, 'carmen')
384 ns ± 2.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit get_none(locations, 'carmen')
394 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Not so much difference. However if the key exists:

In [5]: locations['carmen'] = Location(7, 3)

In [6]: %timeit get_default(locations, 'carmen')

377 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit get_none(locations, 'carmen')

135 ns ± 0.108 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

We get much faster results.

Tuesday, November 13, 2018

direnv

I use the command line a lot. Some projects require different settings, say Python virtual environment, GOPATH for installing go packages and more.

I'm using direnv to help with settings per project in the terminal. For every project I have a .envrc file which specifies required settings, this file is automatically loaded once I change directory to the project directory or any of it's sub directories.

You'll need the following in your .zshrc

if whence direnv > /dev/null; then
eval "$(direnv hook zsh)"
fi

Every time you create or change your .envrc, you'll need to run direnv allow to validate it and make sure it's loaded. (If you did some changes and want to check them, run "cd .")

Here are some .envrc examples for various scenarios:

Python + pipenv

source $(pipenv --venv)/bin/activate

Go

GOPATH=$(pwd | sed s"#/src/.*##")

PATH=${GOPATH}/bin:${PATH}

This assumes your project's path that looks like /path/to/project/src/github.com/project

If you're using the new go modules (in 1.11+), you probably don't need this.

Python + virtualenv

source venv/bin/activate

Python + conda

source activate env-name

Replace env-name with the name of your conda environment.

Thursday, July 26, 2018

Specifying test cases for pytest using TOML

Say you have a function that converts text and you'd like to test it. You can write a directory with input and output and use pytest.parameterize to iterate over the cases. The problem is that the input and the output are in different files and it's not obvious to see them next to each other.

If the text for testing is not that long, you can place all the cases in a configuration file. In this example I'll be using TOML format to hold the cases and each case will be in a table in array of tables. You can probably do the same with multi document YAML.

Here's the are the test cases
And here's the testing code (mask removes passwords from the text)

When running pytest, you'll see the following:

$ python -m pytest -v
========================================= test session starts =========================================
platform linux -- Python 3.7.0, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/miki/.local/share/virtualenvs/pytest-cmp-F3l45TQF/bin/python
cachedir: .pytest_cache
rootdir: /home/miki/Projects/pythonwise/pytest-cmp, inifile:
collected 3 items

test_mask.py::test_mask[passwd] PASSED [ 33%]
test_mask.py::test_mask[password] PASSED [ 66%]
test_mask.py::test_mask[no change] PASSED [100%]

====================================== 3 passed in 0.01 seconds =======================================

Tuesday, February 27, 2018

Python's iter & functools.partial

Python's built-in iter function is mostly used to extract an iterator from an iterable (if you're confused by this, see Raymond's excellent answer on StackOverflow).

iter has a second form where it takes a function with not arguments and a sentinel value. This doesn't see that useful but check out the readline example in the documentation. iter will call the function repeatably until it the function return the sentinel value and then will stop.

What happens when the function producing the values require some arguments? I this case we can use functools.partial to create a zero arguments function.

Here's an example of a simple HTTP client using raw sockets, we can directly use join on the iterator.

Friday, December 08, 2017

Advent of Code 2017 #8 and Python's eval

I'm having fun solving Advent of Code 2017. Problem 8 reminded the power of Python's eval (and before you start commenting the "eval is evil" may I remind you of this :)

You can check The Go implementation that don't have eval and need to work harder.

Saturday, July 15, 2017

Generating Power Set using Bitmap

I was asked to write a function that generate a power set of items. At first I wrote a recursive algorithms but then another approach came to mind. When you calculate how many subsets there are, you can say that each item in the original set can either be or not be in a subset, which means 2^n subsets. This yes/no for including can be seen as a bitmask, and since we know that there are 2^n subsets we can use the number from 0 to 2^n-1 as bitmasks.

Monday, June 19, 2017

Who Touched the Code Last? (git)

Sometimes I'd like to know who to ask about a piece of code. I've developed a little Python script that shows the last people who touch a file/directory and the ones who touched it most.

Example output (on arrow project)
$ owners
Most: Wes McKinney (39.5%), Uwe L. Korn (15.3%), Kouhei Sutou (10.8%)
Last: Kengo Seki (31M), Uwe L. Korn (39M), Max Risuhin (23H)

So ask Wes or Uwe :)

Here's the code:

Monday, December 26, 2016

Automatically Running BigQuery Flows

At a current project we're using Google's BigQuery to crunch some petabyte scale data. We have several SQL scripts that we need to run in specific order. The below script detects the table dependencies and run the SQL scripts in order. As a bonus you can run it with --view and it'll show you the dependency graph.

Saturday, November 12, 2016

Weighted Rating

Notebook here.

Friday, September 16, 2016

Simple Object Pools

Sometimes we need object pools to limit the number of resource consumed. The most common example is database connnections.

In Go we sometime use a buffered channel as a simple object pool.

In Python, we can dome something similar with a Queue. Python's context manager makes the resource handing automatic so clients don't need to remember to return the object.

Here's the output of both programs:


$ go run pool.go
worker 7 got resource 0
worker 0 got resource 2
worker 3 got resource 1
worker 8 got resource 2
worker 1 got resource 0
worker 9 got resource 1
worker 5 got resource 1
worker 4 got resource 0
worker 2 got resource 2
worker 6 got resource 1


$ python pool.py
worker 5 got resource 1
worker 8 got resource 2
worker 1 got resource 3
worker 4 got resource 1
worker 0 got resource 2
worker 7 got resource 3
worker 6 got resource 1
worker 3 got resource 2
worker 9 got resource 3
worker 2 got resource 1

Wednesday, August 24, 2016

Generate Relation Diagram from GAE ndb Model

Working with GAE, we wanted to create relation diagram from out ndb model. By deferring the rendering to dot and using Python's reflection this became an easy task. Some links are still missing since we're using ancestor queries, but this can be handled by some class docstring syntax or just manually editing the resulting dot file.

Tuesday, July 05, 2016

Testing numpy Code

Notebook here.

Friday, June 10, 2016

Work with AppEngine SDK in the REPL

Working again with AppEngine for Python. Here's a small code snippet that will let you work with your code in the REPL (much better than the previous solution).
What I do in IPython is:

In [1]: %run initgae.py

In [2]: %run app.py

And then I can work with my code and test things out.

Tuesday, March 29, 2016

Slap a --help on it

Sometimes we write "one off" scripts to deal with certain task. However most often than not these scripts live more than just the one time. This is very common in ops related code that for some reason people don't apply the regular coding standards to.

It really upsets me when I try to see what a script is doing, run it with --help flag and it happily deletes the database while I wait :) It's so easy to add help support in the command line. In Python we do it with argparse, and we role our own in bash. Both cases it's extra 3 lines of code.

Please be kind to future self and add --help support to your scripts.

Tuesday, February 23, 2016

Removing String Columns from a DataFrame

Sometimes you want to work just with numerical columns in a pandas DataFrame. The rule of thumb is that everything that has a type of object is something not numeric (you can get fancier with numpy.issubdtype). We're going to use the DataFrame dtypes with some boolean indexing to accomplish this.


In [1]: import pandas as pd  

In [2]: df = pd.DataFrame([
   ...:     [1, 2, 'a', 3],
   ...:     [4, 5, 'b', 6],
   ...:     [7, 8, 'c', 9],
   ...: ])  

In [3]: df  
Out[3]: 
   0  1  2  3
0  1  2  a  3
1  4  5  b  6
2  7  8  c  9

In [4]: df.dtypes  
Out[4]: 
0     int64
1     int64
2    object
3     int64
dtype: object

In [5]: df[df.columns[df.dtypes != object]]
Out[5]: 
   0  1  3
0  1  2  3
1  4  5  6
2  7  8  9

In [6]:

Wednesday, November 11, 2015

aenumerate - enumerate for async for

Python's new async/await syntax helps a lot with writing async code. Here's a little utility that provides the async equivalent of enumerate.