Use a quality microphone
Pick a quiet place with little background noise or disturbances
Have the "talent" speak clearly and slowly — slower than feels natural, with good annunciation, discernable breaks between words, and plenty of pauses. Have him/her speak slightly louder than usual ("project") but not so much that it sounds unnatural
Record a short test take before starting to ensure equipment is functional and audio quality is good
Have all text to be recorded printed out and numbered
Begin recording. Have them read each phrase in order with a short pause after each (~3 seconds). Have them read the number (in English, if possible) before each phrase, with a short pause (~1 second) between the number and the phrase. The numbers will aid greatly in identifying which phrase is which, especially if they were recorded in a language other than your own.
Try to record all the phrases in one take (one audio file). Don't use a separate file for each phrase. If the recording is interrupted with background noise and the speaker messes up, let the recorder keep running and continue on when possible, starting with that same phrase.
Do several takes.
Extracting and Splitting
First install Audacity and follow the instructions for downloading and configuring the LAME mp3 encoder. Configure the MP3 encoding settings in Audacity (Edit -> Preferences -> File Formats -> Bit Rate). For speech, 64kbit mono encoding should be adequate. If the audio contains other noises or music, 96kbit mono could be considered. For very high quality applications (at a minimum, CommCare user will be using headphones) use 128kbit stereo. 64kbit mono requires ~7KB per second of audio. We use variable bit-rate encoding (better quality for a given file size).
Extract the recordings from the recording device.
Convert the recordings into .wav format if they are not already converted (most mp3 players have an option to create .wav files).
Open each .wav file in 'audacity' (music editing program)
You should be able to see the numbers and phrases clearly. For each phrase, select the portion you want to extract as the audio clip (with a slight pause both before and after the speaking). Skip attempts that are not usable. Among all the takes for each phrase, choose the best one and discard the rest.
Once selected, chose File --> Export Selection as MP3. Save to the new file name that you want. That's it!
This step makes all the recordings approximately the same volume. First download and install MP3Gain.
Open MP3Gain and choose "open file/folder" and open all the clips you want to use. Then do Gain --> Apply Constant Gain. Configure and tweak as needed.
Command Line Instructions
mp3gain -r -c -d 10 *.mp3 (assuming all the mp3s are in the current directory)
The -d 10 is a volume boost (here, 10dB) to give to all files after they have all been normalized to the same volume. This is because the default volume level tends to sound quiet on the phones. Tailor the amount of boost to your deployment and the devices you will use). Each 10 dB of boost approximately doubles perceived loudness.
Don't boost too much or clipping will occur (the stength of the signal is boosted beyond the maximum of what the sound file can represent; the rest is 'clipped' off). Excessive clipping will sound harsh and severely degrade sound quality. You can view the amount of clipping in audacity.