Below are recommendations for data processing and analysis needs for complex data transformations.
Every CommCare project must eventually interpret the Case and Form data collected by mobile workers. There are two key questions embedded in the design of a data pipeline: the method of export (eg. basic export interface) and the automation of analysis (eg. VBA queries). Both of these assumptions should be regularly revisited in projects that are planning to do large-scale data analysis (~50,000+ rows exported at a time).
Method of Export
|E1||Basic export interface||0 - 50,000 rows||None|
|E2||Daily Saved Exports*||0 - 500,000 rows||None|
|E3||CommCare Data Export Tool writes to an Excel file||10,000 - 500,000 rows|
|E4||CommCare Data Export Tool writes to a database||50,000 - 1,000,000+ rows|
*Note: Daily Saved Exports are pre-compiled data exports. This means that when you go to CommCare HQ, you can download fresh data immediately instead of waiting for a new file to be generated.
Automation of analysis
|Analysis approach||Export method||Scale||Requirements|
|A1||Export into Excel for manual analysis*||E1, E2, E3||0 - 200,000 rows|
|A2||Export into Excel and use macros for analysis||E1, E2, E3||1,000 - 200,000 rows|
Export into a CSV and use either: a scripting language (Python, Ruby, Perl, etc), stats package (Stata, SPSS, SAS, R, MATLAB, etc), or business intelligence software (Tableau, Google Fusion Tables) for analysis
|E1, E2, E3||50,000 - 1,000,000+ rows|
|A4||Export into a database and use database queries (SQL, etc) for analysis||E4||50,000 - 1,000,000+ rows|
|A5||Export into a database and use a web service to dynamically query the database||E4||50,000 - 1,000,000+ rows|
*Note: Depending on the complexity of indicators being calculated, this option does go beyond pivot table capabilities and not be a viable option regardless of the number of rows.