commcare-export is a command line tool designed to let you export your data from CommCare HQ in a simple, configurable, and reusable way.
commcare-export will automatically pull your form and case data from CommCare HQ and save it locally, typically to either an Excel spreadsheet or to a local database (like MySQL). You can configure it to pull data on a schedule to automatically keep your data up to date, or run it as a one-time data dump. Here are the basic steps:
The rest of this page describes how to use
commcare-export in more detail.
One major consideration is whether to export to a file (Excel, CSV, JSON, etc) or to a SQL database. Exporting to a file allows you to get a full results set without setting up a local database. However, the major advantage of exporting to a database is that
commcare-export supports incremental updates. By default, each time the
commcare-export command is run, it will only import results that were modified after the most recently imported form or case. This means that, for very large datasets, you do not need to export a full results set every time you need to access your data. This can be an incredibly powerful feature for a data pipeline. (See more information about Selecting Tools for CommCare Data Processing and Analysis here).
If you are looking to export into a SQL database, you must have this database set up, and you should know how to provide a URL for it.
This section guides you through getting your system ready to use the
commcare-export command-line tool.
Visit the Python for Windows page and download a package appropriate for your system. The following instructions assume that you have select Python 2.7, but this only affects the location of Python on your hard drive. This tool works with all recent version of Pythons. If you use a different version (i.e. Python 3.2) then you will need to replace each instance of "python27" with your version number (i.e. "python32")
pip(a program that automatically installs Python programs)
Install pip for windows (https://sites.google.com/site/pydatalog/python/pip-for-windows)
> c:\python27\scripts\pip install commcare-export
Some versions of MacOS come with outdated versions of Python. To be safe, you should upgrade; this is very easy. The best way to install software on a Mac is via Homebrew. To install Homebrew and then a modern version of Python, open a terminal and run the following two commands. The first installs Homebrew, interactively telling you each thing it is going to do before it does it. The second installs Python and any prerequisites. (BTW, Mac does not have apt-get)
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" brew install python
After that, usage of Homebrew is brew install <SOME PACKAGE>.
pip2 install commcare-export
If you are on a Linux-based operating system, you probably have an automatic package manager available. Lucky you! You probably already have Python installed or know how to install it, but if not, then you will need to invoke the package manager for your operating system. Here are some possible commands.
$ apt-get install python or $ yum install python or $ emerge python
pip(a program that automatically installs Python programs)
$ sudo easy_install pip
$ sudo pip install commcare-export
If you have installed an earlier version of the commcare-export tool and need to upgrade you can do the following:
> c:\python27\scripts\pip install -U commcare-export
sudo pip2 install -U commcare-export
In case you have any problem setting up python on MacOS, reference here: http://docs.python-guide.org/en/latest/starting/install/osx/
$ sudo pip install -U commcare-export
If you get an error "ValueError: Invalid column index 0" you should run this command to downgrade one of the python libraries:
$ pip install -U openpyxl==1.8.6
This section describes how to run the examples that come with commcare-export against a sample project from the CommCareHQ exchange. After working through it, you should have some familiarity with the, its configuration, and how it is used.
Download the examples demo-registrations.xlsx, demo-deliveries.xlsx and demo-pregnancy-cases.xlsx attached to this page. In the following, you will need to adjust the path to the file to be wherever you downloaded them and fill in the name of your project - the bits in bold. The following commands will run the examples in such a way that the "tables" will be simply printed.
Many different fields are available for output in this way in addition to what is shown in these examples. The fields can be determined form the JSON output as they follow the JSON structure down to each property. For instance, based on the form output here: https://www.commcarehq.org/a/YOUR_PROJECT/api/v0.5/form/ - you would be able to use form.case.@caseid or form.meta.username.
YOUR_PROJECT is the name that you gave to the project space for your application. The name of your project is listed on the first page, Dashboard, when you sign into CommCareHQ. You can also see your project space name in any url while working in your : www.commcarehq.org/a/yourprojectname
> C:\python27\scripts\commcare-export --output-format markdown --project YOUR_PROJECT --query demo-registrations.xlsx > C:\python27\scripts\commcare-export --output-format markdown --project YOUR_PROJECT --query demo-deliveries.xlsx > C:\python27\scripts\commcare-export --output-format markdown --project YOUR_PROJECT --query demo-pregnancy-cases.xlsx
$ commcare-export --output-format markdown --project YOUR_PROJECT --query demo-registrations.xlsx $ commcare-export --output-format markdown --project YOUR_PROJECT --query demo-deliveries.xlsx $ commcare-export --output-format markdown --project YOUR_PROJECT --query demo-pregnancy-cases.xlsx
Nearly the same command will save the output to an Excel Workbook.
> C:\python27\scripts\commcare-export --output-format xlsx --output registrations-export-results.xlsx --project YOUR_PROJECT --query demo-registrations.xlsx
(try modifying the command above to save the results of each of the examples to a file)
On MacOS / Linux:
(try coming up with this command yourself based on reading the Windows example above!)
When saving to SQL you should make sure all your sheet names (table names) and field names (column names) are lowercase since not all database correctly handle case.
See Exporting to SQL section for more details.
By choosing "
--output-format sql" and
"--output URL" you can save the output to a SQL database. First, let us try it with SQLite, a simple database that is almost certainly already installed and ready to be used on any modern computer, so you do not have to have anything set up to run these examples. In the following, when you see a lite that starts with "
sqlite>" it is mimicking what it is like to interact with the database, so you should type in the bits that come after that.
In these commands, the term
DATABASE_URL is a standard URL format for databases. If, for example, you have a PostgreSQL database hosted on
registrations that you would like to save into instead, the URL might look like
postgresql://db.example.com/registrations (you may need to include username and password in the URL, but fully explaining the format is beyond the scope of this tutorial).
> C:\python27\scripts\commcare-export --output-format sql --output DATABASE_URL --project YOUR_PROJECT --query demo-registrations.xlsx
$ commcare-export --output-format sql --output DATABASE_URL --project YOUR_PROJECT --query demo-registrations.xlsx
|See Exporting to SQL section for database URL formats and additional details.|
5. Synchronize recent changes periodically
Note: This only applies when exporting to a SQL database.
One expected way to use this tool is to run a weekly/daily/hourly export of just those forms and cases which have changed. To set this up you will need knowledge of how to set up scheduled tasks, so this tutorial will just show a single example command and trust that you are comfortable adjusting it to suit your needs.
When exporting to a SQL database the export tool can pick up where it left off after the previous run. This means that once you have completed an export all you have to do in order to get the most recent data is re-run the same command again.
This section describes in detail the Excel query file required for commcare-export. You should open one (or many) of the examples (demo-registrations.xlsx, demo-deliveries.xlsx, demo-pregnancy-cases.xlsx) and compare them to the explanations below.
Each sheet of the configuration workbook is the configuration for an output table. Each sheet should have a unique name. The name of the Excel sheet will become the name of your output table.
The sheet consists of a series of columns with names that are special to commcare-export. These column headers are required but also freeing - you may have any other columns you like as long as they do not have one of the reserved titles.
form, every form submitted to a project space will be returned. Most users will, at a minimum, want to download data for one particular form at a time. To select a particular form type, set Filter Name to
xmlns.exactand Filter Value to your form's XML namespace. You can find this for any given form by going to your form -> Advanced -> View (Source XML) and look for the address in the header of your XForm. See additional form filtering options here: List Forms API.
case, every case in a project's casedb will be returned. Most users will, at a minimum, want to download data for one case_type at a time. To download one particular case type, set Filter Name to
typeand Filter Value to the name of the case type you'd like to download. See additional case filtering fields here: List Cases API.
Source Field: Each row of this column specifies where the data of the corresponding Field should be extracted from. Note that form properties must be prefixed with
form. and case properties must be prefixed with
properties. (See more details about the proper format for Source Fields below in the How to Reference Fields in Forms and Cases section).
Alternate Source Field [N]: (optional) In instances where different source fields should be placed in the same output field (for example when renaming fields in the app) alternate source fields can be specified. Each Alternate Source Field
column must be named Alternate Source Field [N] where "[N]" is replaced by a number. For example:
|Source Field||Alternate Source Field 1||Alternate Source Field 2|
The source fields are each considered in order and the first one that appears in the data is used (even if it's value is empty).
Table Name: (optional) By default the table name is taken as the Excel Sheet name. This can be overridden by adding this column with a single value in the column for the table name.
Include Referenced Items: (optional) (advanced) Each item in this column is a referenced id for which the entire object will be included. For example, when fetching from the case API, if you put
xforms into this column, then instead of just getting the
xform_ids, you will get the entire forms in a property called
xforms. For cases, you can include:
child_cases, parent_cases, xforms. For forms, you can include:
Including referenced items in the results may decrease the performance of the tool significantly due to the additional data that is being requested which must be fetched separately from the main form or case data.
Here is an example:
One of the easiest ways to confirm form and case field names is by looking at the column name in a regular form/case export from CommCareHQ.
See also this reference page on commonly exported fields called Export Field Reference
In general, the name of form and case fields matches those that you wrote when you created your application, with predictable modifications. You should open an example Excel configuration file right now to compare what you see there with the explanations below. When referencing properties in the Source Field column, ensure that you've typed them in exactly as they're defined in CommCare - case sensitivity matters.
Top-Level Form Fields: A field named FIELD in a form is referenced by
form.FIELD. For example, the field
age is referenced by
form.age in an Excel config file.
Top-Level Case Fields: A field name FIELD in a case is referenced by
properties.FIELD. For example, the field
delivery_date is referenced by
properties.delivery_date in an Excel config file.
Nested Fields: A field nested more deeply within a form or case may be referenced by separating each level of nesting with a period. For example in demo-deliveries.xlsx you will see the nested field
form.delivery_information.delivery_type which will pull out the value the delivery_type question inside the group delivery_information. This will always match the structure of your application.
(Power Users / Programmers Only) To find out exactly what fields are available, you must view the raw output of the CommCareHQ Case List API and Form List API. You can then reference these fields using an arbitrary JSONPath into the structure.
The following are functions which can be used to change the value of a column. They can be added to the Map Via column in the Excel query.
|Function name||Description||Example usage||Since version|
|count-selected||Count the number of options selected for a multi-select value.|
Get the option at the given position for a multi-select value. Note: the position index is zero-indexed.
This means that to choose the first item, you need to enter 0, the second item, 1, etc.
|selected||Checks to see if a value was selected from a multi-select value. Outputs 'True' or 'False'.|
|attachment_url||Converts a value into a URL that can be used to download attachments|
|Data Source||Filter Name||Filter Value||Field||Source Field||Map Via|
|Database name||URL format||Maximum field name length||Dependencies|
Requires the 'psycopg2' library:
Requires the 'pymysql' library:
Requires the 'pyodbc' library:
Also requires the SQL Server drivers to be installed.
This is only relevant if you're exporting to SQL.
By default the export tool only supports exporting strings, integer values and boolean values. In order to support dates as well as conversion of other types the following functions can be used in the Map Via column:
Convert a string value to a boolean. The following values result in a True output:
'true', 't', '1' (case doesn't matter)
If the value is already a boolean then it will be output without change. All other values result in a False output.
|bool2int||This supports converting boolean values and booleans represented as strings to the integers 1 or 0.|
|str2num||Convert a string representation of a number to a number.|
|str2date||Convert a string representation of a date or timestamp to a date or timestamp.|
You can also use the command line option "--strict-types" which will prevent SQL column types from being changed. This may result in errors if incompatible data is received from CommCare HQ.
Once you have a query file ready, it's time to run your tool on the command line! Here are the required arguments that you will need to provide:
|--output||Filename or path to SQL database|
|--project||Project space name (i.e. the PROJECT_NAME value in commcarehq.org/a/PROJECT_NAME/dashboard/project/)|
|--query||Filepath to Excel query file|
|--username||E-mail address used for CommCare log-in|
|--password||Password used for CommCare log-in|
Your query should look something like this for:
$ commcare-export --output-format xlsx --output my_excel_file.xlsx --project PROJECT_NAME --query my_excel_query_file.xlsx --username MY_COMMCARE_LOG_IN_EMAIL --password MY_COMMCARE_PASSWORD
$ commcare-export --output-format sql --output mysql+pymysql://MYSQL_USER:MYSQL_PASSWORD@127.0.0.1/MYSQL_DB_NAME?charset=utf8 --project PROJECT_NAME --query my_excel_query_file.xlsx --username MY_COMMCARE_LOG_IN_EMAIL --password MY_COMMCARE_PASSWORD
For a full list of flags, you can run
DATE value in format YYYY-MM-DD. Will export all data after this date
|--until||DATE value in format YYYY-MM-DD. Will export all data before this date|
|--verbose||Includes more detailed output to the user.|
Value to use when a field is missing from the form / case.
|--api-version||Specify the version of the CommCare API to use for exports|
|--start-over||When saving to a SQL database; the default is to pick up since the last success. This disables that.|
|--strict-types||When saving to a SQL database, don't allow changing column types once they are created.|
|--version||Output the version of commcare-export that you're using (useful for debugging)|
If you're using Two Factor Authentication on your domain, you will be required to add an additional argument to the command line prompt: --auth-mode apikey. An example query is below.
$ commcare-export --output-format xlsx --output my_excel_file.xlsx --project PROJECT_NAME --query my_excel_query_file.xlsx --username MY_COMMCARE_LOG_IN_EMAIL --auth-mode apikey --password API_Key
Instead of entering your password, you will be required to enter your API Key as defined under the "My Information" section of "My Account Settings."
commcare-exporttool will only export fields if they exist in the version of the application that the end user was using when they submitted each particular form and case. If you are exporting older forms or cases, keep in mind that their property names may be different from what is currently visible in the application.
commcare_export_runsdatabase table has been created, and that the query is visible there.