Python安装常用包

使用Python的原因

 其一语言特性：Python是解释性语言，入门比较简单，开发比较快，本身的易用。  
其二 社区：  社区比较活跃，开发生态成熟，强大的工具库储备

其使用目的涉及以下几个方面

<1>数据收集之爬虫，收集数据                                 Scrapy、Beautiful Soup、lxml  
<2>数据管理之文本处理：  
<3>数据挖掘                                             pandas、NumPy、Scipy、matplotlib               
<4>机器学习：一些常用的库，例如机器学习库                  scikit-learn  
<5>数据展示                                             Flask
<6>编程IDE使用：IPython Notebook

0. 操作系统：

Windows 
Linux  自带 python

1.成功安装Python<安装方式自行百度>

设置了环境变量，把C:\Python27 和C:\Python27\Scripts 添加到环境变量中

2.常用的包以及安装：

 jupyter IPython    了ipython notebook的功能和ipython shell下  
 Numpy是一个用python实现的科学计算包  
 pandas 是基于 Numpy 构建的含有更高级数据结构和工具的数据分析包  
 SciPy是一个开源的Python算法库和数学工具包  
 绘图和可视化依赖于matplotlib模块，matplotlib的风格与matlab类似  
 scikit-learn 是一个基于SciPy和Numpy的开源机器学习模块  
 NLTK(Natural Language Toolkit)是Python  的自然语言处理模块，包括一系列的字符处理和语言统计模型
 pip install--index https://pypi.mirrors.ustc.edu.cn/simple/  beautifulsoup4  
 pip install --index https://pypi.mirrors.ustc.edu.cn/simple/ matplotlib

3.出现问题以及解决方式：

<1>  pip install jupyter  
出现问题： 出现如下错误，  
Retrying (Retry(total=4, connect=None, read=None, redirect=None)) 
after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.python.org', port=443): Read timed out. 
(read timeout=15)",)': /simple/pip/
出错原因：网络问题，墙在，你懂~  
解决办法  
    使用国内镜像下载python库的办法。
    以下载pandas为例，终端输入命令（前提是python正确安装）：
pip install  --index https://pypi.mirrors.ustc.edu.cn/simple/ jupyter
pip install  --index https://pypi.mirrors.ustc.edu.cn/simple/ pandas       ##依赖于numpy，所以不需要安装   
注：--index后面也可以换成别的镜像，比如http://mirrors.sohu.com/python/
 要配制成默认的话，  
需要创建或修改配置文件（linux的文件在~/.pip/pip.conf，  windows在%HOMEPATH%\pip\pip.ini），  
修改内容为：
code:  
[global]  
index-url = http://pypi.douban.com/simple  
这样在使用pip来安装时，会默认调用该镜像。<注意，python要顶格写>

问题.2 
pip install  --index https://pypi.mirrors.ustc.edu.cn/simple/ scipy  
出现错误如下：  
raise NotFoundError('no lapack/blas resources found')
numpy.distutils.system_info.NotFoundError: no lapack/blas resources found   
Command "c:\python27\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\yt\\appdata\\local\\temp\\pip-build-xyyjlc\\SciPy\\setup.py';
f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" 
install  --record c:\users\Test\appdata\local\temp\pip-zczpj_-record\install-record.txt 
--single-version-externally-managed 
--compile" failed with error code 1 in c:\users\Test\appdata\local\temp\pip-build-xyyjlc\SciPy\

安装爬虫命令

pip install  --index https://pypi.mirrors.ustc.edu.cn/simple/ lxml  
 pip install  --index https://pypi.mirrors.ustc.edu.cn/simple/ Scrapy  
出现错误：  
error: Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat). Get it from http://aka.ms/vcpython27

Command "c:\python27\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\Test\\appdata\\local\\temp\\pip-build-xeeujn\\lxml\\setup.py';
f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" 
install --record c:\users\Test\appdata\local\temp\pip-dlfby2-record\install-record.txt 
--single-version-externally-managed 
--compile" failed with error code 1 in c:\users\Test\appdata\local\temp\pip-build-xeeujn\lxml\  
原因：  
参考网址：https://blogs.msdn.microsoft.com/  pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/
解决办法：  
    ###原因一：pip太旧，升级
    真实原因：没有编译文件
  方案一：安装Vs2008
  方案二：安装MinGW
  方案三：安装编译好的wheel文件：
    wheel 文件是Python的一种生成包格式文件，像一种特定的zip文件，以‘ .whl ’后缀。
    对于wheel文件的安装，要使用pip工具来进行。安装whl文件 pip install --index https://pypi.mirrors.ustc.edu.cn/simple/ wheel
    下载lxml的whl文件，
    方案三：
     参考：http://pythonwheels.com/
   方案二：
    参考：https://my.oschina.net/zhangdapeng89/blog/54407
    参考：http://cache.baiducontent.com/c?m=9f65cb4a8c8507ed4fece76310579135480ddd276b97844b22918448e435061e5a25a4ec66644b598f8461670bac4c56eefb2b2577437df78cc8fe0d81e8c276789f274236419141648742f39d5125b764c11ebaef58b4e4ae2593d8828099401697135375d7b0cb074003cb1fe71447f4a7e90f490c40e4ba27648f4e775f882230f01aeee1427907f7e1dc2c189876867610e7b835c62913c557f55b486405b74dc15b0f7827e13f30ff322a05e2bb0ea1&p=897ed416d9c105ff57ee957d6142bb&newp=c065de1c94904ead08e2977f0e0d9c231610db2151d7da1f6b82c825d7331b001c3bbfb42323160fd4c2776001ad4a59e1f7357931092ba3dda5c91d9fb4c57479&user=baidu&fm=sc&query=error%2Dunable%2Dto%2Dfind%2Dvcvarsall%2Dbat&qid=eb2e002800040eed&p1=4

Ubuntu下安装scrapy

需要的环境  
sudo apt-get install python-lxml python-dev libffi-dev libssl-dev  
 pip install scrapy  
可以用下面的命令测试一下安装是否正 scrapy -v  
 scrapy startproject weather
进入主目录: cd weather  //主目录是有scrapy.cfg的文件夹 
scrapy.cfg: 项目的配置文件
weather/: 该项目的python模块。之后将在此加入代码。
weather/items.py: 项目中的item文件.
weather/pipelines.py: 项目中的pipelines文件.
weather/settings.py: 项目的设置文件.
weather/spiders/: 放置spider代码的目录.

安装使用虚拟环境

安装virtualenv：
sudo apt install virtualenv
创建虚拟环境
virtualenv Spiders
 Spiders是新创建的虚拟环境的名称。 同时会创建一个与虚拟环境名称相同的文件夹 Spiders, 里面存储了一个独立的Python执行环境。
进入虚拟环境
source Spiders/bin/activate
进入虚拟环境后，命令行的提示符会加入虚拟环境的名称，例如：(venv)user@machine:~$
退出虚拟环境$
deactivate
删除虚拟环境
rm -r Spiders
直接删除虚拟环境所在的文件夹venv就删除了我们创建的venv虚拟环境。

参考书目

基础

《像计算机科学家一样思考Python 第2版》
《Python入门经典:以解决计算问题为导向的Python编程实践》<The Practice of Computing Using Python> 建议读英文版的

爬虫的一些内容：

《Python网络数据采集》<Web Scraping with Python Collecting Date From The Modern Web>
《用Python写网络爬虫》<Web Scraping with Python>
《Python网络编程攻略》<Python Network Programming Cookbook>

数据管理

《干净的数据:数据清洗入门与实践》<Clean Data>

数据挖掘与机器学习

《利用Python进行数据分析》<Python for Data Analysis>
《Python金融大数据分析》<Python for Finance: Analyze Big Financial Data>
《Python数据挖掘入门与实践》<Learning Data Mining with Python>

Web展示

《Flask Web开发:基于Python的Web应用开发实战》

使用Python的原因

其使用目的涉及以下几个方面

0. 操作系统：

1.成功安装Python<安装方式自行百度>

2.常用的包以及安装：

3.出现问题以及解决方式：

安装爬虫命令

Ubuntu下安装scrapy

安装使用虚拟环境

参考书目

基础

爬虫的一些内容：

数据管理

数据挖掘与机器学习

Web展示

blogroll

social