Difference between revisions of "Voice translation Qt"

From GCompris
Jump to: navigation, search
(Spliting a single recorded file: add stereo to mono)
(Recording / Encoding)
 
(15 intermediate revisions by 3 users not shown)
Line 4: Line 4:
  
 
= Get the voices =
 
= Get the voices =
 
+
You can browse the voices online from git [https://invent.kde.org/education/gcompris-data online repository] or clone it: <pre>git clone https://invent.kde.org/education/gcompris-data.git</pre>
== From the source code ==
 
First, grab the development tree and the voices in a separate top level directory, then make a link from boards/voices to this location. You can do it by following this process:
 
<pre>git clone https://github.com/bdoin/GCompris-voices.git</pre>
 
 
 
== Online ==
 
Alternatively, you can browse the voices online from the gnome git [https://github.com/bdoin/GCompris-voices online repository].
 
  
 
= Create your voices directory =
 
= Create your voices directory =
Line 16: Line 10:
 
First copy the English voices as a template in a new directory with you locale name (e.g. my):
 
First copy the English voices as a template in a new directory with you locale name (e.g. my):
 
<code><pre>
 
<code><pre>
cd boards/voices
+
cd gcompris-data/voices
 
cp -r en my</pre></code>
 
cp -r en my</pre></code>
  
Provide your voice translation in boards/voices/my for each English voice.
+
Provide your voice translation in gcompris-data/voices/my for each English voice.
  
 
= Recording / Encoding =
 
= Recording / Encoding =
Line 25: Line 19:
 
It is best to choose somebody that speaks and articulate well your language (a teacher is a good candidate).
 
It is best to choose somebody that speaks and articulate well your language (a teacher is a good candidate).
  
* You can make the recording with [http://audacity.sourceforge.net/ audacity]. It is mandatory to use mono wav 16bit/44100Hz to get the best quality/size ratio and because this works for any sound card.
+
You can find a lot of practical advices in [https://librivox.org/pages/about-recording/ librivox] that can help improve the quality of the recording.
 +
 
 +
* You can make the recording with [http://audacity.sourceforge.net/ audacity]. It is mandatory to use mono wav 16bit/44100Hz to get the best quality/size ratio and because this works for any sound card (If you are using a Zoom H1, configure the DIP switches under it with 'LO CUT=ON', 'AUTO LEVEL=ON', 'REC FORMAT=WAV' and select WAV format 44/16 in the display panel).
 
* Save as WAV file
 
* Save as WAV file
* Apply loudness transform with sox if necessary. You may use a script like this if the sound is too low:
+
* Normalize the WAV files if necessary. You may use a script like this to normalize the sound:
 
<code><pre>
 
<code><pre>
 
#!/bin/sh
 
#!/bin/sh
mkdir modif
+
mkdir normalized
 +
cp *.wav normalized/
 +
cd normalized
 
for i in *.wav; do
 
for i in *.wav; do
   sox $i -r 44100 -b 16 modif/$i norm
+
   normalize $i
 
done
 
done
 
</pre></code>
 
</pre></code>
  
* In the directory where WAV files are run:
+
* In the directory where normalized WAV files are, run a script like this to convert to OGG:
<code><pre>oggenc -q0 --downmix -a "(name of author) -d "date of recording (YYYY/MM/DD)" -c "copyright=GPL V3+" *.wav
+
 
</pre></code>
+
<pre>
 +
#!/bin/sh
 +
mkdir ogg
 +
for f in *.wav; do
 +
  oggenc -q3 -o ogg/${f%.*}.ogg --downmix $f
 +
  vorbiscomment -w ogg/${f%.*}.ogg -t "ARTIST=<your name here>" -t "TITTLE=GCompris" -t "COPYRIGHT=GPL V3+" -t "DATE=2015"
 +
done
 +
</pre>
  
 
The '''ogginfo''' command should display the comment.  
 
The '''ogginfo''' command should display the comment.  
 
The last stage is to pass the tool '''normalize-ogg''' to make all the sound having a standardized volume level.
 
  
 
* Copy OGG files in the corresponding directory
 
* Copy OGG files in the corresponding directory
Line 50: Line 53:
 
You can make a recording session in a single file and then split it with Audacity. First step is to change it from stereo to mono (menu Tracks -> Stereo to Mono). Then you can label each word or sentence by selecting it and using the "track->add label at selection" option (ctrl-b).
 
You can make a recording session in a single file and then split it with Audacity. First step is to change it from stereo to mono (menu Tracks -> Stereo to Mono). Then you can label each word or sentence by selecting it and using the "track->add label at selection" option (ctrl-b).
  
You can label as many words as the file contains. Try to be precise and avoid unnecessary blank at start and end of words. Once the labels are placed, use the 'File->export multiple" feature (ctrl-shift-l). Select ogg, keep the quality option to the default (5) (TBD, is this the best for voices?).
+
You can label as many words as the file contains. Try to be precise and avoid unnecessary blank at start and end of words. Once the labels are placed, use the 'File->export multiple" feature (ctrl-shift-l). Select WAV format.
  
 
In the meta data dialog, put your name in Artist, album title is GCompris, put the Year and add a Copyright tag with the value GPL V3+. You can then click on 'set default' and no more worry about that. You can even in the global configuration in the export option disable the metadata dialog box entirely.
 
In the meta data dialog, put your name in Artist, album title is GCompris, put the Year and add a Copyright tag with the value GPL V3+. You can then click on 'set default' and no more worry about that. You can even in the global configuration in the export option disable the metadata dialog box entirely.
 +
 +
Then normalize the output WAV files and convert them to OGG using the same scripts as explained in the previous section.
  
 
== Tagging ==
 
== Tagging ==
  
If you don't have the correct tag information in the ogg files, you can retag it with:
+
If you don't have the correct tag information in the OGG files, you can retag them with:
  
 
<pre>
 
<pre>
 +
#!/bin/sh
 
for f in *.ogg; do
 
for f in *.ogg; do
 
   vorbiscomment -w $f -t "ARTIST=<you name>" -t "TITTLE=GCompris" -t "COPYRIGHT=GPL V3+" -t "DATE=2015"
 
   vorbiscomment -w $f -t "ARTIST=<you name>" -t "TITTLE=GCompris" -t "COPYRIGHT=GPL V3+" -t "DATE=2015"
Line 66: Line 72:
 
== Normalizing ==
 
== Normalizing ==
  
Volumes of files may mismatch. To normalize them:
+
Volumes of files may mismatch. It is better to normalize the WAV files, as explained in the previous sections. Though if you don't have the WAV files anymore, it is still possible to do it on the OGG files, but it will not give the same quality as if it was done on the WAV files. To normalize the OGG files, you can use a script like this:
  
 
<pre>
 
<pre>
 +
#!/bin/sh
 
for f in *.ogg; do
 
for f in *.ogg; do
 
   normalize-ogg $f
 
   normalize-ogg $f
 +
done
 +
</pre>
 +
 +
== Stereo to Mono ==
 +
 +
Same as normalizing, it is better to convert from Stereo to Mono on the WAV files. If you missed this step and only have the OGG files available, you can still convert them in mono with:
 +
 +
<pre>
 +
#!/bin/sh
 +
for f in *.ogg; do
 +
  oggdec $f
 +
  oggenc -q3 -o $f --downmix ${f%.*}.wav
 +
  vorbiscomment -w $f -t "ARTIST=<your name here>" -t "TITTLE=GCompris" -t "COPYRIGHT=GPL V3+" -t "DATE=2015"
 
done
 
done
 
</pre>
 
</pre>
Line 82: Line 102:
 
= Lang word list =
 
= Lang word list =
  
The lang activity contains a set of about 1000 images originaly comming from the [http://www.art4apps.org/ art4apps project] and are released under CC-BY-SA. In order to make it available in your language, you must provide a voice recording of each words and an UTF-8 encoded file named content-<your_locale>.json that contains the translation of each word as spoken in the ogg file. The ogg files are in the voice directory <locale>/words and the json file in in the source code under /src/activities/imageid/resource/content-<your-locale>.json. If your language has genders it is mandatory to enter them in the recording and the ''content'' file.
+
The lang activity contains a set of about 1000 images originaly comming from the [http://www.art4apps.org/ art4apps project] and are released under CC-BY-SA. In order to make it available in your language, you must provide a voice recording of each words and an UTF-8 encoded file named content-<your_locale>.json that contains the translation of each word as spoken in the ogg file. The ogg files are in the voice directory <locale>/words and the json file in in the source code under /src/activities/lang/resource/content-<your-locale>.json. If your language has genders it is mandatory to enter them in the recording and the ''content'' file.
  
 
You can see on a [http://gcompris.net/incoming/lang/words.html single page] the images and their english name.
 
You can see on a [http://gcompris.net/incoming/lang/words.html single page] the images and their english name.
Line 97: Line 117:
 
The voice are packaged in an rcc file. It can be tested by following this instructions.
 
The voice are packaged in an rcc file. It can be tested by following this instructions.
  
To create the rcc file, you script '''generate_voices_rcc.sh'''. It creates all the rcc files for all the langs in .rcc/voices-ogg. To test your creation, copy the files here under $HOME/.local/share/KDE/gcompris-qt/data2/voices-ogg and run GCompris with the download option disabled or the version of gcompris.net will take.
+
To create the rcc file, use the script '''update_voices.sh''' in gcompris-data/voices (you can comment the lines to generate aac and mp3 versions if you are using linux). It creates all the rcc files for all the langs in .rcc/voices-ogg. To test your creation, copy the files here under $HOME/.cache/KDE/gcompris-qt/data2/voices-ogg and run GCompris with the download option disabled or the version of gcompris.net will overwrite your version if it exists there.
  
Once done, the generated rcc must be uploaded on gcompris.net by someone with the admin rights. On gcompris.net go in /opt/gcompris and run the script ./updateVoices.sh. It pulls the latest voice git, pack them in .rcc and install them under /var/www/data2 ready to be downloaded by GCompris.
+
Once done, the generated rcc must be uploaded on our download server by someone with the admin rights.  
  
 
[[Category:Translation]]
 
[[Category:Translation]]
 
[[Category:English]]
 
[[Category:English]]

Latest revision as of 13:46, 8 January 2023

Status

You can get on this page the status of what's missing for each locale.

Get the voices

You can browse the voices online from git online repository or clone it:

git clone https://invent.kde.org/education/gcompris-data.git

Create your voices directory

First copy the English voices as a template in a new directory with you locale name (e.g. my):

cd gcompris-data/voices
cp -r en my

Provide your voice translation in gcompris-data/voices/my for each English voice.

Recording / Encoding

It is best to choose somebody that speaks and articulate well your language (a teacher is a good candidate).

You can find a lot of practical advices in librivox that can help improve the quality of the recording.

  • You can make the recording with audacity. It is mandatory to use mono wav 16bit/44100Hz to get the best quality/size ratio and because this works for any sound card (If you are using a Zoom H1, configure the DIP switches under it with 'LO CUT=ON', 'AUTO LEVEL=ON', 'REC FORMAT=WAV' and select WAV format 44/16 in the display panel).
  • Save as WAV file
  • Normalize the WAV files if necessary. You may use a script like this to normalize the sound:
#!/bin/sh
mkdir normalized
cp *.wav normalized/
cd normalized
for i in *.wav; do
  normalize $i
done
  • In the directory where normalized WAV files are, run a script like this to convert to OGG:
#!/bin/sh
mkdir ogg
for f in *.wav; do
  oggenc -q3 -o ogg/${f%.*}.ogg --downmix $f
  vorbiscomment -w ogg/${f%.*}.ogg -t "ARTIST=<your name here>" -t "TITTLE=GCompris" -t "COPYRIGHT=GPL V3+" -t "DATE=2015"
done 

The ogginfo command should display the comment.

  • Copy OGG files in the corresponding directory

Spliting a single recorded file

You can make a recording session in a single file and then split it with Audacity. First step is to change it from stereo to mono (menu Tracks -> Stereo to Mono). Then you can label each word or sentence by selecting it and using the "track->add label at selection" option (ctrl-b).

You can label as many words as the file contains. Try to be precise and avoid unnecessary blank at start and end of words. Once the labels are placed, use the 'File->export multiple" feature (ctrl-shift-l). Select WAV format.

In the meta data dialog, put your name in Artist, album title is GCompris, put the Year and add a Copyright tag with the value GPL V3+. You can then click on 'set default' and no more worry about that. You can even in the global configuration in the export option disable the metadata dialog box entirely.

Then normalize the output WAV files and convert them to OGG using the same scripts as explained in the previous section.

Tagging

If you don't have the correct tag information in the OGG files, you can retag them with:

#!/bin/sh
for f in *.ogg; do
  vorbiscomment -w $f -t "ARTIST=<you name>" -t "TITTLE=GCompris" -t "COPYRIGHT=GPL V3+" -t "DATE=2015"
done

Normalizing

Volumes of files may mismatch. It is better to normalize the WAV files, as explained in the previous sections. Though if you don't have the WAV files anymore, it is still possible to do it on the OGG files, but it will not give the same quality as if it was done on the WAV files. To normalize the OGG files, you can use a script like this:

#!/bin/sh
for f in *.ogg; do
  normalize-ogg $f
done

Stereo to Mono

Same as normalizing, it is better to convert from Stereo to Mono on the WAV files. If you missed this step and only have the OGG files available, you can still convert them in mono with:

#!/bin/sh
for f in *.ogg; do
  oggdec $f 
  oggenc -q3 -o $f --downmix ${f%.*}.wav
  vorbiscomment -w $f -t "ARTIST=<your name here>" -t "TITTLE=GCompris" -t "COPYRIGHT=GPL V3+" -t "DATE=2015"
done

Alphabet

The English alphabet directory contains files named U0030.ogg. These are the voices for each single letter in your locale with a UTF-8 Unicode notation. For example, U+0030 is the character 0, and U+0069 is the character i. You can get the table for each subset.

Warning: We only need the lower case version of each letter. For example, we have U+0061 (letter 'a') but we don't need U+0041 (letter 'A'). Same for accentuated letter.

Lang word list

The lang activity contains a set of about 1000 images originaly comming from the art4apps project and are released under CC-BY-SA. In order to make it available in your language, you must provide a voice recording of each words and an UTF-8 encoded file named content-<your_locale>.json that contains the translation of each word as spoken in the ogg file. The ogg files are in the voice directory <locale>/words and the json file in in the source code under /src/activities/lang/resource/content-<your-locale>.json. If your language has genders it is mandatory to enter them in the recording and the content file.

You can see on a single page the images and their english name.

Shipping

Once done, the easiest way is to tar all this files and send them to the GCompris maintainer.

tar -cvzf voices_my.tgz my

Integration

The voice are packaged in an rcc file. It can be tested by following this instructions.

To create the rcc file, use the script update_voices.sh in gcompris-data/voices (you can comment the lines to generate aac and mp3 versions if you are using linux). It creates all the rcc files for all the langs in .rcc/voices-ogg. To test your creation, copy the files here under $HOME/.cache/KDE/gcompris-qt/data2/voices-ogg and run GCompris with the download option disabled or the version of gcompris.net will overwrite your version if it exists there.

Once done, the generated rcc must be uploaded on our download server by someone with the admin rights.