Python: Generate all combinations of a list
Nathan and I have been playing around with different scikit-learn machine learning classifiers and we wanted to run different combinations of features through each one and work out which gave the best result.
We started with a list of features:
all_columns = ["Fare", "Sex", "Pclass", 'Embarked']
itertools#combinations allows us to create combinations with a length of our choice:
>>> import itertools as it
>>> list(it.combinations(all_columns, 3))
[('Fare', 'Sex', 'Pclass'), ('Fare', 'Sex', 'Embarked'), ('Fare', 'Pclass', 'Embarked'), ('Sex', 'Pclass', 'Embarked')]
We wanted to create combinations of arbitrary length so we wanted to combine a few invocations of that functions like this:
>>> list(it.combinations(all_columns, 2)) + list(it.combinations(all_columns, 3))
[('Fare', 'Sex'), ('Fare', 'Pclass'), ('Fare', 'Embarked'), ('Sex', 'Pclass'), ('Sex', 'Embarked'), ('Pclass', 'Embarked'), ('Fare', 'Sex', 'Pclass'), ('Fare', 'Sex', 'Embarked'), ('Fare', 'Pclass', 'Embarked'), ('Sex', 'Pclass', 'Embarked')]
If we generify that code to remove the repetition we end up with the following:
all_the_features = []
for r in range(1, len(all_columns) + 1):
all_the_features + list(it.combinations(all_columns, r))
>>> all_the_features
[('Fare',), ('Sex',), ('Pclass',), ('Embarked',), ('Fare', 'Sex'), ('Fare', 'Pclass'), ('Fare', 'Embarked'), ('Sex', 'Pclass'), ('Sex', 'Embarked'), ('Pclass', 'Embarked'), ('Fare', 'Sex', 'Pclass'), ('Fare', 'Sex', 'Embarked'), ('Fare', 'Pclass', 'Embarked'), ('Sex', 'Pclass', 'Embarked'), ('Fare', 'Sex', 'Pclass', 'Embarked')]
or if we want to use reduce instead:
>>> reduce(lambda acc, x: acc + list(it.combinations(all_columns, x)), range(1, len(all_columns) + 1), [])
[('Fare',), ('Sex',), ('Pclass',), ('Embarked',), ('Fare', 'Sex'), ('Fare', 'Pclass'), ('Fare', 'Embarked'), ('Sex', 'Pclass'), ('Sex', 'Embarked'), ('Pclass', 'Embarked'), ('Fare', 'Sex', 'Pclass'), ('Fare', 'Sex', 'Embarked'), ('Fare', 'Pclass', 'Embarked'), ('Sex', 'Pclass', 'Embarked'), ('Fare', 'Sex', 'Pclass', 'Embarked')]
I imagine there is probably a simpler way that I don’t know about yet…!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.