( Reblogging from – http://blog.yhathq.com/posts/setting-up-scientific-python.html )
Setting Up Scientific Python
I’ve found that one of the most difficult parts of using the Scientific Python libraries is getting them installed and setup on my computer. pandas, scipy, numpy, and sklearn make heavy use of C/C++ extensions which can be difficult to compile and configure on whatever flavor of OS you use. In this post I’ll go over the easiest way to install the libraries you need to get up and running with Scientific Python.
Getting Python
Step 1 is getting Python of course! If you don’t already have Python2.7 installed on your computer, select one of these distributions from python.org. Double-check to make sure you don’t already have Python 2.7 installed, many UNIX distributions ship with it.
- Python 2.7.3 Windows Installer
- Python 2.7.3 Windows X86-64 Installer
- Python 2.7.3 Mac OS X 64-bit/32-bit x86-64/i386 Installer
- Python 2.7.3 Mac OS X 32-bit i386/PPC Installer
- Python 2.7.3 compressed source tarball (for Linux, Unix or Mac OS X)
- Python 2.7.3 bzipped source tarball (for Linux, Unix or Mac OS X, more compressed)
Installing pip
Next you need to install pip, Python package manager.
OSX
$ curl http://python-distribute.org/distribute_setup.py $ python distribute_setup.py
Windows
Download Christoph Gohlke’s installer
Linux
Debian, Ubuntu
$ apt-get install python-pip
CentOS, Fedora
$ yum -y install python-pip
Make sure pip is on your PATH. If it isn’t, add the python/scripts directory to your PATH.
Enthought Free Distribution
Enthought, which provides commerical support for Scientific Python, is nice enough to publish an installer that works on Windows, OSX, and Linux. This eliminates a lot of headaches of having to compile libaries and ensures you get the most stable versions. There are different tiers of installers, including paid versions, but for most people the free version is all you’ll need. They’re website is a little tricky to navigate (they sort of funnel you to the non-free versions), but here’s the page you want. Select the distribution for your OS and it’ll start the download. This part could take a while. Packed into the installer are the following libraries:
- scipy
- numpy
- ipython
- matplotlib
- pandas
- sympy
- nose
- traits
- chaco
Once the download finishes double-clicking the installer will get you setup with everything–including adding all libraries to your PYTHON PATH.
Installing sklearn, statsmodels, and patsy
Now that we’ve got core libaries installed, it’s time to add some fun stats packages. The Enthought distribution took care of the compiled dependencies. pip makes installing these libraries a breeze:
$ pip install --upgrade scikit-learn $ pip install --upgrade statsmodels $ pip install --upgrade patsy
These libraries are going to start spitting out a lot of garbage into the terminal during the install. Don’t worry, this is normal! You might want to take this time to have someone non-technical come by your computer.
Development environment
For ease of usage and interactive computing use IPython (http://ipython.org/ ) and it also makes it easy to share your activity as an IPython Notebook.
That’s it! You should be ready to go. If you run into any problems (typically happens if you have previous versions of libraries installed), checkstackoverflow or the numpy/pandas/sklearn docs.