Merged
Conversation
943a90c to
5871e04
Compare
luotao1
reviewed
Jun 1, 2018
| framework::OpDesc op_desc(op, nullptr, nullptr); | ||
| PADDLE_ENFORCE_EQ(op_desc.Input("X").size(), 1); | ||
| PADDLE_ENFORCE_EQ(op_desc.Input("Y").size(), 1); // Y is a weight | ||
| PADDLE_ENFORCE_EQ(op_desc.Output("Out").size(), 1); // Y is a weight |
| for (int h = 0; h < shape.h(); ++h) { | ||
| for (int w = 0; w < shape.w(); ++w) { | ||
| odata[h * ostrides.h() + w * ostrides.w()] = | ||
| idata[h * ostrides.h() + w * ostrides.w()]; |
Contributor
There was a problem hiding this comment.
从函数实现看,odata[i] = idata[i],所以还需要转么?
Contributor
Author
There was a problem hiding this comment.
从 TF 里拷贝的,试了下,很奇怪,这个函数好像必须要用。���有一个 Reorder4的函数,后面用到的时候我再仔细看下。
Contributor
There was a problem hiding this comment.
后面写Reorder4的时候,可以这里再优化下:
for (int h = 0; h < shape.h(); ++h) {
for (int w = 0; w < shape.w(); ++w) {
int index = h * ostrides.h() + w * ostrides.w();
odata[index] = idata[index];
| namespace inference { | ||
| namespace tensorrt { | ||
|
|
||
| template <typename T> |
Contributor
There was a problem hiding this comment.
Reorder2和ReorderCKtoKC和函数功能和参数含义,请加一下注释。
| PADDLE_ENFORCE_NOT_NULL(Y_v); | ||
| auto* Y_t = Y_v->GetMutable<framework::LoDTensor>(); | ||
| // This may trigger a CPU->GPU copy. | ||
| // TODO(Superjomn) use some smarter mutable_data. |
Contributor
There was a problem hiding this comment.
Y_v从scope里读取的话,可以一开始就在gpu环境下,这里就不用拷贝了。
|
|
||
| void DeclOutputVar(const std::string& name, const nvinfer1::Dims& dims) { | ||
| DeclVar(name, dims); | ||
| } |
Contributor
There was a problem hiding this comment.
DeclParamVar和DeclOutputVar是一模一样的,需��封两个函数么?
Contributor
Author
There was a problem hiding this comment.
用的时候的确需要不同语义,这里区分下 program 和 output, 不然代码里两处都是 DeclVar,看不出区别
| for (const auto& output : op_desc_->OutputArgumentNames()) { | ||
| std::vector<float> fluid_out; | ||
| std::vector<float> trt_out(200); | ||
| std::vector<float> trt_out(200, 2008.); |
| engine_.reset(new inference::tensorrt::TensorRTEngine( | ||
| max_batch_, max_workspace, nullptr)); | ||
| // TODO(Superjomn) parameters should be passed be analysised and passed from | ||
| // outside. |
luotao1
approved these changes
Jun 1, 2018
| for (int h = 0; h < shape.h(); ++h) { | ||
| for (int w = 0; w < shape.w(); ++w) { | ||
| odata[h * ostrides.h() + w * ostrides.w()] = | ||
| idata[h * ostrides.h() + w * ostrides.w()]; |
Contributor
There was a problem hiding this comment.
后面写Reorder4的时候,可以这里再优化下:
for (int h = 0; h < shape.h(); ++h) {
for (int w = 0; w < shape.w(); ++w) {
int index = h * ostrides.h() + w * ostrides.w();
odata[index] = idata[index];
Merged
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.