·
python
Python: scikit-learn: ImportError: cannot import name __check_build
In part 3 of Kaggle’s series on text analytics I needed to install scikit-learn and having done so ran into the following error when trying to use one of its classes:
>>> from sklearn.feature_extraction.text import CountVectorizer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/sklearn/__init__.py", line 37, in <module>
from . import __check_build
ImportError: cannot import name __check_build
This error doesn’t reveal very much but I found that when I exited the REPL and tried the same command again I got a different error which was a bit more useful:
>>> from sklearn.feature_extraction.text import CountVectorizer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/sklearn/__init__.py", line 38, in <module>
from .base import clone
File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/sklearn/base.py", line 10, in <module>
from scipy import sparse
ImportError: No module named scipy
The fix for this is now obvious:
$ pip install scipy
And I can now load CountVectorizer without any problem:
$ python
Python 2.7.5 (default, Aug 25 2013, 00:04:04)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sklearn.feature_extraction.text import CountVectorizer
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.