Skip to content

Commit

Permalink
Merge branch 'release-1.1.3'
Browse files Browse the repository at this point in the history
  • Loading branch information
bbengfort committed Mar 12, 2016
2 parents 8390ed1 + f6220ba commit 54647dc
Show file tree
Hide file tree
Showing 8 changed files with 223 additions and 35 deletions.
113 changes: 82 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,49 @@
# Tribe
**Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.**

**Social Network Analysis of Email**

<!-- [![PyPI version][pypi_img]][pypi_href] -->
[![PyPI version][pypi_img]][pypi_href]
[![Build Status][travis_img]][travis_href]
[![Coverage Status][coveralls_img]][coveralls_href]
<!-- [![Code Health][health_img]][health_href] -->
[![Code Health][health_img]][health_href]
[![Documentation Status][rtfd_img]][rtfd_href]
[![Stories in Ready][waffle_img]][waffle_href]

[![SNA Visualization](docs/images/sna_viz.png)](docs/images/sna_viz.png)

This repository contains code for the Social Network Analysis with Python
course that is being hosted by District Data Labs.
Tribe is a utility that will allow you to extract a network (a graph) from a communication network that we all use often - our email. Tribe is designed to read an email mbox (a native format for email in Python)and write the resulting graph to a GraphML file on disk. This utility is generally used for District Data Labs' Graph Analytics with Python and NetworkX course, but can be used for anyone interested in studying networks.

## Downloading your Data

This code will work with email data; and hopefull you have some to use for
the class. In particular, we will use a common format for email storage
called `mbox` - if you have Apple Mail, Thunderbird or Microsoft Outlook
you should be able to export your `mbox` with a bit of ease. If you have
[Gmail](https://gmail.com), follow these steps to export your mail:
One easy place to obtain a communications network to perform graph analyses is your email. Tribe extracts the relationships between unique email addresses by exploring who is connected by participating in the same email address. In particular, we will use a common format for email storage called `mbox`. If you have Apple Mail, Thunderbird, or Microsoft Outlook, you should be able to export your mbox. If you have [Gmail](https://gmail.com) you may have to use an online email extraction tool. For more on downloading your data, see [Exporting an MBox from Email](http://ddl-tribe.readthedocs.org/en/latest/emails/)

## Extracting a Graph from Email

1. Download your email mbox, in this example it's in a file called `myemails.mbox`.

2. Install the tribe utility with `pip`:

1. Go to [https://www.google.com/settings/datatools](https://www.google.com/settings/datatools).
2. Click on "Create a new archive"
3. Select only Mail to be added to the archive
4. Select your compression format (zip for Windows, tgz for Mac)
5. Once the archive has been created, you will receive an email notifaction
$ pip install tribe

Make sure you do this in advance of the class, it can take hours or even
days for the archive to be created!
Note that you may need administrator priviledges to do this.

For more information: [Download your Data: Per-service information](https://support.google.com/accounts/answer/3024195?hl=en)
3. Extract a graph from your email MBox as follows:

## Getting Started
$ tribe-admin.py extract -w myemails.grpahml myemails.mbox

To work with this code, you'll need to do a few things to set up your environment, follow these steps to put together a _development ready environment_. Note that there are some variations of the methodology for various operating systems, the notes below are for Linux/Unix (including Mac OS X). Feel free to add Windows/Powershell instructions to help out as well.
Be patient, this could take some time, on my Macbook Pro it took 12 minutes to perform the complete extraction on an MBox that was 7.5 GB.

1. Clone this repository
You're now ready to get started analyzing your email network!

## Developing for Tribe

To work with this code, you'll need to do a few things to set up your environment, follow these steps to put together a _development ready environment_. Note that there are some variations of the methodology for various operating systems, the notes below assume Linux/Unix (including Mac OS X).

1. Fork, then clone this repository

Using the git command line tool, this is a pretty simple step:

$ git clone https://github.com/DistrictDataLabs/tribe.git

Optionally, you can fork this repository into your own user directory, and clone that instead.

2. Change directories (cd) into the project directory

$ cd tribe
Expand All @@ -54,27 +53,79 @@ To work with this code, you'll need to do a few things to set up your environmen
Using `virtualenv` by itself:

$ virtualenv venv
$ source venv/bin/activate

Using `virtualenvwrapper` (configured correctly):

$ mkvirtualenv -a $(pwd) tribe

4. Install the required third party packages using `pip`:

$ pip install -r requirements.txt

Note, this may take a little while, but if you already have `matplotlib` and `pygraphviz` installed already, you should have little trouble.
(venv)$ pip install -r requirements.txt

5. Test everything is working:

$ python tribe-admin.py --help

You should see a help screen printed out.

### Contributing

Tribe is open source, and we'd love your help. If you would like to contribute, you can do so in the following ways:

1. Add issues or bugs to the bug tracker: [https://github.com/DistrictDataLabs/tribe/issues](https://github.com/DistrictDataLabs/tribe/issues)
2. Work on a card on the dev board: [https://waffle.io/DistrictDataLabs/tribe](https://waffle.io/DistrictDataLabs/tribe)
3. Create a pull request in Github: [https://github.com/DistrictDataLabs/tribe/pulls](https://github.com/DistrictDataLabs/tribe/pulls)

Note that labels in the Github issues are defined in the blog post: [How we use labels on GitHub Issues at Mediocre Laboratories](https://mediocre.com/forum/topics/how-we-use-labels-on-github-issues-at-mediocre-laboratories).

If you are a member of the District Data Labs Faculty group, you have direct access to the repository, which is set up in a typical production/release/development cycle as described in _[A Successful Git Branching Model](http://nvie.com/posts/a-successful-git-branching-model/)_. A typical workflow is as follows:

1. Select a card from the [dev board](https://waffle.io/DistrictDataLabs/tribe) - preferably one that is "ready" then move it to "in-progress".

2. Create a branch off of develop called "feature-[feature name]", work and commit into that branch.

~$ git checkout -b feature-myfeature develop

3. Once you are done working (and everything is tested) merge your feature into develop.

~$ git checkout develop
~$ git merge --no-ff feature-myfeature
~$ git branch -d feature-myfeature
~$ git push origin develop

4. Repeat. Releases will be routinely pushed into master via release branches, then deployed to the server.

## Contributors

Thank you for all your help contributing to make Tribe a great project!

### Maintainers

- Benjamin Bengfort: [@bbengfort](https://github.com/bbengfort/)

### Contributors

- Your name welcome here!

## Changelog

The release versions that are sent to the Python package index (PyPI) are also tagged in Github. You can see the tags through the Github web application and download the tarball of the version you'd like.

The versioning uses a three part version system, "a.b.c" - "a" represents a major release that may not be backwards compatible. "b" is incremented on minor releases that may contain extra features, but are backwards compatible. "c" releases are bug fixes or other micro changes that developers should feel free to immediately update to.

### Version 1.1.2

* **tag**: [v1.1.2](https://github.com/DistrictDataLabs/tribe/releases/tag/v1.1.2)
* **release**: Thursday, November 20, 2014
* **deployment**: Friday, March 11, 2016
* **commit**: [69fe3c6](https://github.com/DistrictDataLabs/tribe/commit/69fe3c69130899479be2e33f73872d6cfedd4659)

This is the initial release of Tribe that has been used for teaching since the first SNA workshop in 2014. This version was cleaned up a bit, with extra dependency removal and better organization. This is also the first version that was deployed to PyPI.

<!-- References -->
[pypi_img]: https://badge.fury.io/py/ddl-tribe.svg
[pypi_href]: https://badge.fury.io/py/ddl-tribe
[pypi_img]: https://badge.fury.io/py/tribe.svg
[pypi_href]: https://badge.fury.io/py/tribe
[travis_img]: https://travis-ci.org/DistrictDataLabs/tribe.svg?branch=master
[travis_href]: https://travis-ci.org/DistrictDataLabs/tribe/
[coveralls_img]: https://coveralls.io/repos/github/DistrictDataLabs/tribe/badge.svg?branch=master
Expand All @@ -83,5 +134,5 @@ To work with this code, you'll need to do a few things to set up your environmen
[health_href]: https://landscape.io/github/DistrictDataLabs/tribe/master
[waffle_img]: https://badge.waffle.io/DistrictDataLabs/tribe.png?label=ready&title=Ready
[waffle_href]: https://waffle.io/DistrictDataLabs/tribe
[rtfd_img]: https://readthedocs.org/projects/ddl-tribe/badge/?version=latest
[rtfd_href]: http://ddl-tribe.readthedocs.org/
[rtfd_img]: http://readthedocs.org/projects/ddl-tribe/badge/?version=latest
[rtfd_href]: http://ddl-tribe.readthedocs.org/en/latest/
57 changes: 57 additions & 0 deletions docs/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# About

Tribe is a utility that will allow you to extract a network (a graph) from a communication network that we all use often - our email. Tribe is designed to read an email mbox (a native format for email in Python)and write the resulting graph to a GraphML file on disk. This utility is generally used for District Data Labs' Graph Analytics with Python and NetworkX course, but can be used for anyone interested in studying networks.

## Contributing

Tribe is open source, and I'd love your help. If you would like to contribute, you can do so in the following ways:

1. Add issues or bugs to the bug tracker: [https://github.com/DistrictDataLabs/tribe/issues](https://github.com/DistrictDataLabs/tribe/issues)
2. Work on a card on the dev board: [https://waffle.io/DistrictDataLabs/tribe](https://waffle.io/DistrictDataLabs/tribe)
3. Create a pull request in Github: [https://github.com/DistrictDataLabs/tribe/pulls](https://github.com/DistrictDataLabs/tribe/pulls)

Note that labels in the Github issues are defined in the blog post: [How we use labels on GitHub Issues at Mediocre Laboratories](https://mediocre.com/forum/topics/how-we-use-labels-on-github-issues-at-mediocre-laboratories).

If you are a member of the District Data Labs Faculty group, you have direct access to the repository, which is set up in a typical production/release/development cycle as described in _[A Successful Git Branching Model](http://nvie.com/posts/a-successful-git-branching-model/)_. A typical workflow is as follows:

1. Select a card from the [dev board](https://waffle.io/DistrictDataLabs/tribe) - preferably one that is "ready" then move it to "in-progress".

2. Create a branch off of develop called "feature-[feature name]", work and commit into that branch.

~$ git checkout -b feature-myfeature develop

3. Once you are done working (and everything is tested) merge your feature into develop.

~$ git checkout develop
~$ git merge --no-ff feature-myfeature
~$ git branch -d feature-myfeature
~$ git push origin develop

4. Repeat. Releases will be routinely pushed into master via release branches, then deployed to the server.

## Contributors

Thank you for all your help contributing to make Tribe a great project!

### Maintainers

- Benjamin Bengfort: [@bbengfort](https://github.com/bbengfort/)

### Contributors

- Your name welcome here!

## Changelog

The release versions that are sent to the Python package index (PyPI) are also tagged in Github. You can see the tags through the Github web application and download the tarball of the version you'd like.

The versioning uses a three part version system, "a.b.c" - "a" represents a major release that may not be backwards compatible. "b" is incremented on minor releases that may contain extra features, but are backwards compatible. "c" releases are bug fixes or other micro changes that developers should feel free to immediately update to.

### Version 1.1.2

* **tag**: [v1.1.2](https://github.com/DistrictDataLabs/tribe/releases/tag/v1.1.2)
* **release**: Thursday, November 20, 2014
* **deployment**: Friday, March 11, 2016
* **commit**: [69fe3c6](https://github.com/DistrictDataLabs/tribe/commit/69fe3c69130899479be2e33f73872d6cfedd4659)

This is the initial release of Tribe that has been used for teaching since the first SNA workshop in 2014. This version was cleaned up a bit, with extra dependency removal and better organization. This is also the first version that was deployed to PyPI.
46 changes: 46 additions & 0 deletions docs/emails.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Exporting an MBox from Email

One easy place to obtain a communications network to perform graph analyses is your email. Tribe extracts the relationships between unique email addresses by exploring who is connected by participating in the same email address. In particular, we will use a common format for email storage called `mbox`. If you have Apple Mail, Thunderbird, or Microsoft Outlook, you should be able to export your mbox. If you have [Gmail](https://gmail.com) you may have to use an online email extraction tool.

## Gmail or Google Apps

**Note, if you're taking the DDL Workshop, make sure you do this in advance of the class, it can take hours or even days for the archive to be created!**

1. Go to [https://takeout.google.com/settings/takeout](https://takeout.google.com/settings/takeout).
2. In the "select data to include" box, make sure Mail is turned on and everything else is turned off, then click Next.
3. Select your compression format (zip for Windows, tgz for Mac) and click Create Archive.
4. Once the archive has been created, you will receive an email notification.

## Outlook
1. Select the messages you would like to export, or the folder, if you would like to export the entire folder.
2. Click the MessageSave Outlook toolbar button.
3. Select "include subfolders" if you would like to export subfolders of the current folder as well.
4. Select "MBOX" in the "Format" field.Click "Save Now".
5. That's it. You should see mbox file(s) created in the destination directory.
6. MessageSave creates one file per Outlook folder processed.

## Thunderbird

1. Go to the [Import/Export Tools website](https://addons.mozilla.org/en-US/thunderbird/addon/importexporttools/).
2. Right-click on the download link and select "Save Target/Link As."
3. Save the ".xpi" file to your computer's hard disk and note the location.
4. Start up Thunderbird and select "Add-ons" from the "Tools" menu.
5. Click "Extensions" in the new window and click "Install."
6. Browse to your saved ImportExport Tools ".xpi" file and click "Open."
7. Click the "Install Now" button and close Thunderbird.
8. Restart Thunderbird and select "ImportExport Tools" from the "Tools" menu. Click "Options."
9. Select the "Export Directories" tab. Check the box next to "Export folders as MBOX file."
10. Browse to the drive and folder to which you want to export your mbox files. Click "OK" twice.
11. Select "ImportExport Tools" from the "Tools" menu again. Click on "Export all the folders."
12. Choose a folder from Thunderbird's collective "Profiles" folder and its contents will be exported as mbox files.

## Apple Mail

1. Select one or more mailboxes to export.

To select mailboxes that are next to each other (contiguous) in the list, hold down Shift as you click the first and last mailbox. To select mailboxes that are not next to each other in the list, hold down Command as you click each mailbox.

2. Choose Mailbox > Export Mailbox, or choose Export Mailbox from the Action pop-up menu (looks like a gear) at the bottom of the sidebar.
3. Choose a folder or create a new folder where you want to store the exported mailbox, and then click Choose.

Mail exports the mailboxes as .mbox packages. If you previously exported a mailbox, Mail does not overwrite the existing .mbox file but appends a number to the filename of the new export to create a new version, such as My Mailbox 3.mbox.
24 changes: 23 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
# Tribe Documentation

Documentation to follow soon!
Social networks are not new, even though websites like Facebook and Twitter might make you want to believe they are; and trust me- I’m not talking about Myspace! Social networks are extremely interesting models for human behavior, whose study dates back to the early twentieth century. However, because of those websites, data scientists have access to much more data than the anthropologists who studied the networks of tribes!

Because networks take a relationship-centered view of the world, the data structures that we will analyze model real world behaviors and community. Through a suite of algorithms derived from mathematical Graph theory we are able to compute and predict behavior of individuals and communities through these types of analyses. Clearly this has a number of practical applications from recommendation to law enforcement to election prediction, and more.

Tribe is a utility that will allow you to extract a network (a graph) from a communication network that we all use often - our email. Tribe is designed to read an email mbox (a native format for email in Python)and write the resulting graph to a GraphML file on disk. This utility is generally used for District Data Labs' Graph Analytics with Python and NetworkX course, but can be used for anyone interested in studying networks.

## Quick Start

1. Download your data. See [Extracting an MBox from Email](emails.md) for more information on how to accomplish this.

2. Install the tribe utility with `pip`:

$ pip install tribe

3. If you would like to develop for tribe, please see the instructions in the README.

4. Extract a graph from your email MBox as follows:

$ tribe-admin.py extract -w myemails.grpahml myemails.mbox

5. Be patient, this could take some time, on my Macbook Pro it took 12 minutes to perform the complete extraction on an MBox that was 7.5 GB.

You're now ready to get started analyzing your email network!
13 changes: 12 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1 +1,12 @@
site_name: My Docs
site_name: Tribe
repo_name: GitHub
repo_url: https://github.com/DistrictDataLabs/tribe
site_description: Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.
site_author: District Data Labs
copyright: Built by District Data Labs, licensed by <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png" /></a>
theme: readthedocs

pages:
- "Introduction": index.md
- "Aquiring an Email MBox": emails.md
- "About Tribe": about.md
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ unicodecsv==0.14.1

## Other Dependencies
# networkX Dependencies
#scipy==0.17.0
decorator==4.0.9
# confire Dependencies
PyYAML==3.11
Expand Down
2 changes: 1 addition & 1 deletion tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
## Module Constants
##########################################################################

TEST_VERSION = "1.1.2" ## Also the expected version of the package
TEST_VERSION = "1.1.3" ## Also the expected version of the package

##########################################################################
## Initialization Tests
Expand Down
2 changes: 1 addition & 1 deletion tribe/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
__version_info__ = {
'major': 1,
'minor': 1,
'micro': 2,
'micro': 3,
'releaselevel': 'final',
'serial': 0,
}
Expand Down

0 comments on commit 54647dc

Please sign in to comment.