Create Audio Feature in SDK#344
Conversation
nickyfantasy
commented
Mar 28, 2018
- Add apis to record audio in SDK
- Add corresponding apis in pybind, storage.py, sdk.h
- Implement reservoir sampling when collecting audio samples
* Add apis to record audio in SDK * Add corresponding apis in pybind, storage.py, sdk.h * Implement reservoir sampling when collecting audio samples
|
@nickyfantasy Please run pre-commit to format the code. The Travis CI will check the format. |
visualdl/logic/sdk.cc
Outdated
| void Audio::SetSample(int index, | ||
| int sample_rate, | ||
| const std::vector<value_t>& data) { | ||
| CHECK_GT(sample_rate, 0) << "sample rate should be something like 6000, 8000 or 44100"; |
There was a problem hiding this comment.
should be a positive number?
visualdl/logic/sdk.cc
Outdated
| int sample_rate, | ||
| const std::vector<value_t>& data) { | ||
| CHECK_GT(sample_rate, 0) << "sample rate should be something like 6000, 8000 or 44100"; | ||
| CHECK_LT(index, num_samples_); |
There was a problem hiding this comment.
We can add error messages for these 2 as well.
visualdl/logic/sdk.h
Outdated
| struct AudioRecord { | ||
| int step_id; | ||
| int sample_rate; | ||
| std::vector<int> data; |
There was a problem hiding this comment.
So here the data is 'int' instead of 'float'?
There was a problem hiding this comment.
yes, when we write data in, we convert float to string, when we read the data out, we convert binary to int
| << "g_log_dir should be set in LogReader construction"; | ||
| BinaryRecordReader brcd(GenBinaryRecordDir(g_log_dir), filename); | ||
|
|
||
| std::transform(brcd.data.begin(), |
There was a problem hiding this comment.
is brcd.data the same as res.data?
There was a problem hiding this comment.
brcd.data is the data in string format when we saved in file, when we read the data we convert to integer that becomes res.data
visualdl/logic/sdk.h
Outdated
|
|
||
| /* | ||
| * A combined interface for IsSampleTaken and SetSample, simpler but might be | ||
| * low effience. |
visualdl/python/storage.py
Outdated
|
|
||
| def audio(self, tag): | ||
| """ | ||
| Get a audio reader with tag |
visualdl/logic/pybind.cc
Outdated
| auto tablet = self.tablet(tag); | ||
| return vs::components::ImageReader(self.mode(), tablet); | ||
| }) | ||
| .def("get_image", [](vs::LogReader& self, const std::string& tag) { |
There was a problem hiding this comment.
these seem just some code style formatting, and we each other have a different clang-format config that results in some diff, maybe we need a unified version of config that makes we have the same code style?
In paddle, the clang-format 3.8 and google c++ style is used, different config and version may lead to some diff.
We can reference paddle's .clang-format configuration
There was a problem hiding this comment.
yes, I just updated clang-format
visualdl/logic/sdk.cc
Outdated
| CHECK_LE(index, num_records_); | ||
|
|
||
| //convert float vector to char vector | ||
| std::vector<char> data_str(data.size()); |
There was a problem hiding this comment.
it seems that data_str can directly be a string and no need to tranform from vector to string again.
std::string data_str(data.size());
...
BinaryRecord brcd(xxdir, std::move(data_str));There was a problem hiding this comment.
ok, I end up just use std::string(data.begin(),data.end()) to directly convert the data vector to string
visualdl/logic/sdk.h
Outdated
| struct AudioRecord { | ||
| int step_id; | ||
| int sample_rate; | ||
| std::vector<int> data; |
There was a problem hiding this comment.
To meet the audio value interval, is short or unsigned char or char is enough?
https://cn.mathworks.com/help/matlab/ref/audiorecorder.getaudiodata.html?s_tid=gn_loc_drop
here just int16, int8, uint8 are used, not int32.
Just a suggestion, not that important, int works good.
There was a problem hiding this comment.
you are right, when we were doing speech recognition app before, we just use byte / int8
| .def("total_records", &cp::TextReader::total_records) | ||
| .def("size", &cp::TextReader::size); | ||
|
|
||
| py::class_<cp::Audio>(m, "AudioWriter", R"pbdoc( |
There was a problem hiding this comment.
Will it be weird to have documentations published on the website but not the code is not in the release pip? I am not sure what's the best approach here.
visualdl/logic/pybind.cc
Outdated
| )pbdoc"); | ||
|
|
||
| py::class_<cp::AudioReader::AudioRecord>(m, "AudioRecord") | ||
| // TODO(ChunweiYan) make these copyless. |
There was a problem hiding this comment.
Either remove the TODO or update it to yours
visualdl/logic/sdk.cc
Outdated
| num_records_ = 0; | ||
| } | ||
|
|
||
| int Audio::IsSampleTaken() { |
There was a problem hiding this comment.
Minor stuff, the function name is implying that the function will return a BOOL, but the function returns an index.
Maybe rename the function to NextRandSampleIndex or provide a comment here to explain the logic.