Paste your logs.

Built for Minecraft & Hytale

Unknown Log

6692 lines
Raw
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 1, GPU 72 (MiB)
11:20:19 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +731, GPU +2, now: CPU 20115, GPU 1344 (MiB)
11:20:20 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:20 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:20 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2404ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.778871 seconds.
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 125 MiB
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 1, GPU 86 (MiB)
11:20:21 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +675, GPU +2, now: CPU 20076, GPU 1358 (MiB)
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3523ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.79518 seconds.
11:20:22 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 125 MiB
11:20:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 2, GPU 99 (MiB)
11:20:23 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +715, GPU +0, now: CPU 20108, GPU 1372 (MiB)
11:20:23 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2645ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.801743 seconds.
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 125 MiB
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 2, GPU 112 (MiB)
11:20:25 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +749, GPU +0, now: CPU 20110, GPU 1386 (MiB)
11:20:25 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2596ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.821087 seconds.
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 133 MiB
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 2, GPU 125 (MiB)
11:20:27 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +741, GPU +2, now: CPU 20114, GPU 1402 (MiB)
11:20:27 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2935ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.828896 seconds.
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 146 MiB
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 2, GPU 138 (MiB)
11:20:28 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +715, GPU +2, now: CPU 20125, GPU 1416 (MiB)
11:20:29 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2517ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.780999 seconds.
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 159 MiB
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 3, GPU 151 (MiB)
11:20:30 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:31 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +749, GPU +0, now: CPU 20115, GPU 1430 (MiB)
11:20:31 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:31 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 10 inputs and 6 output network tensors.
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2858ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.773004 seconds.
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 171 MiB
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 3, GPU 164 (MiB)
11:20:32 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:33 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +728, GPU +0, now: CPU 20107, GPU 1444 (MiB)
11:20:33 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 7 inputs and 8 output network tensors.
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 379968 bytes
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 80 steps to complete.
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.6269ms to assign 5 blocks to 80 nodes requiring 10485760 bytes.
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 6769344 bytes
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.35131 seconds.
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 183 MiB
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 7 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 4, GPU 181 (MiB)
11:20:35 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:35 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +717, GPU +2, now: CPU 20119, GPU 1464 (MiB)
11:20:35 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3196ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.766223 seconds.
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 202 MiB
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 4, GPU 194 (MiB)
11:20:36 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:37 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +718, GPU +2, now: CPU 20118, GPU 1478 (MiB)
11:20:37 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3123ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.766578 seconds.
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 215 MiB
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 4, GPU 207 (MiB)
11:20:38 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:39 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +713, GPU +2, now: CPU 20129, GPU 1492 (MiB)
11:20:39 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:39 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2769ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.736688 seconds.
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 228 MiB
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:40 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 5, GPU 220 (MiB)
11:20:40 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:41 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +736, GPU +0, now: CPU 20132, GPU 1506 (MiB)
11:20:41 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:41 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:41 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2479ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.754449 seconds.
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 241 MiB
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 5, GPU 233 (MiB)
11:20:42 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +727, GPU +0, now: CPU 20153, GPU 1520 (MiB)
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3089ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.77584 seconds.
11:20:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 254 MiB
11:20:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 5, GPU 247 (MiB)
11:20:44 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:44 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +704, GPU +2, now: CPU 20148, GPU 1536 (MiB)
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3358ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.765545 seconds.
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 268 MiB
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 6, GPU 260 (MiB)
11:20:46 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:46 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +729, GPU +2, now: CPU 20149, GPU 1550 (MiB)
11:20:46 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2889ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.766297 seconds.
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 281 MiB
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 6, GPU 273 (MiB)
11:20:47 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:48 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +716, GPU +2, now: CPU 20159, GPU 1568 (MiB)
11:20:48 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:48 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 6 output network tensors.
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2508ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.74689 seconds.
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 293 MiB
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 6, GPU 286 (MiB)
11:20:49 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:50 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +743, GPU +0, now: CPU 20160, GPU 1582 (MiB)
11:20:50 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 8 inputs and 8 output network tensors.
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 379968 bytes
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 81 steps to complete.
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.5298ms to assign 6 blocks to 81 nodes requiring 10486272 bytes.
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 7441088 bytes
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.26093 seconds.
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 305 MiB
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 8 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 7, GPU 303 (MiB)
11:20:52 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +765, GPU +0, now: CPU 20210, GPU 1600 (MiB)
11:20:52 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3229ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.81125 seconds.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 324 MiB
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 7, GPU 316 (MiB)
11:20:53 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +744, GPU +2, now: CPU 20195, GPU 1616 (MiB)
11:20:54 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3399ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.805102 seconds.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 337 MiB
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 8, GPU 330 (MiB)
11:20:55 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +684, GPU +2, now: CPU 20198, GPU 1630 (MiB)
11:20:56 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.319ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.78612 seconds.
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 351 MiB
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 8, GPU 343 (MiB)
11:20:57 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:20:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +749, GPU +2, now: CPU 20206, GPU 1644 (MiB)
11:20:58 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:20:58 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:20:58 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:20:58 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3791ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.781787 seconds.
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 364 MiB
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:20:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 8, GPU 356 (MiB)
11:20:59 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +696, GPU +0, now: CPU 20214, GPU 1658 (MiB)
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2631ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:21:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:21:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:21:01 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:01 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.77001 seconds.
11:21:01 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 377 MiB
11:21:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 8, GPU 369 (MiB)
11:21:01 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +751, GPU +0, now: CPU 20219, GPU 1672 (MiB)
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3085ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.78596 seconds.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 390 MiB
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 9, GPU 383 (MiB)
11:21:03 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:03 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +707, GPU +2, now: CPU 20226, GPU 1688 (MiB)
11:21:03 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3007ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.791843 seconds.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 404 MiB
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 9, GPU 396 (MiB)
11:21:04 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:05 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +715, GPU +2, now: CPU 20232, GPU 1702 (MiB)
11:21:05 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 6 output network tensors.
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3023ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.783161 seconds.
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 416 MiB
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 9, GPU 409 (MiB)
11:21:06 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +750, GPU +0, now: CPU 20250, GPU 1716 (MiB)
11:21:07 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:08 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:08 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 10 inputs and 6 output network tensors.
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 465008 bytes
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 108 steps to complete.
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.3912ms to assign 7 blocks to 108 nodes requiring 22052864 bytes.
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 10556224 bytes
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.84104 seconds.
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 428 MiB
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 11 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 10, GPU 440 (MiB)
11:21:09 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +746, GPU +0, now: CPU 20260, GPU 1750 (MiB)
11:21:10 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8445ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.19394 seconds.
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 459 MiB
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 11, GPU 466 (MiB)
11:21:12 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +752, GPU +4, now: CPU 20272, GPU 1782 (MiB)
11:21:12 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:13 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:13 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:13 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8879ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18345 seconds.
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 485 MiB
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:14 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 11, GPU 492 (MiB)
11:21:14 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:15 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +757, GPU +4, now: CPU 20282, GPU 1810 (MiB)
11:21:15 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:15 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:15 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:15 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8397ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.15668 seconds.
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 511 MiB
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:16 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 12, GPU 518 (MiB)
11:21:16 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:17 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +744, GPU +0, now: CPU 20291, GPU 1838 (MiB)
11:21:17 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.1741ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17524 seconds.
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 537 MiB
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:18 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 12, GPU 544 (MiB)
11:21:18 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +703, GPU +0, now: CPU 20317, GPU 1866 (MiB)
11:21:19 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.0181ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.15952 seconds.
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 563 MiB
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 12, GPU 571 (MiB)
11:21:21 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +771, GPU +2, now: CPU 20358, GPU 1898 (MiB)
11:21:21 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:22 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:22 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:22 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9751ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18046 seconds.
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 590 MiB
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 13, GPU 597 (MiB)
11:21:23 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +720, GPU +2, now: CPU 20328, GPU 1926 (MiB)
11:21:24 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:24 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:24 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.881ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.15464 seconds.
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 616 MiB
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 13, GPU 623 (MiB)
11:21:25 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:26 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +723, GPU +0, now: CPU 20340, GPU 1954 (MiB)
11:21:26 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:26 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 14 inputs and 4 output network tensors.
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 77 steps to complete.
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9905ms to assign 8 blocks to 77 nodes requiring 22053376 bytes.
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18474 seconds.
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 641 MiB
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 14, GPU 649 (MiB)
11:21:27 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:21:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +708, GPU +2, now: CPU 20354, GPU 1984 (MiB)
11:21:28 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 1 output network tensors.
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 263984 bytes
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 0 bytes
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 70 steps to complete.
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.6134ms to assign 8 blocks to 70 nodes requiring 21495808 bytes.
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 21495808 bytes
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 4384288 bytes
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.15378 seconds.
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 23 MiB, GPU 658 MiB
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 1454467 bytes of compilation cache.
11:21:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 5053 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 5 MiB
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +20, now: CPU 14, GPU 673 (MiB)
11:21:30 torch_tensorrt._compile WARNING: Provided model is a torch.fx.GraphModule and retrace is False, inputs or arg_inputs is not necessary during save.
11:21:32 py.warnings WARNING: torch_tensorrt\dynamo\_exporter.py:396: UserWarning: Attempted to insert a get_attr Node with no underlying reference in the owning GraphModule! Call GraphModule.add_submodule to add the necessary submodule, GraphModule.add_parameter to add the necessary Parameter, or nn.Module.register_buffer to add the necessary buffer
engine_node = gm.graph.get_attr(engine_name)
11:21:32 py.warnings WARNING: torch\export\exported_program.py:1681: UserWarning: Unable to execute the generated python source code from the graph. The graph module will no longer be directly callable, but you can still run the ExportedProgram, and if needed, you can run the graph module eagerly using torch.fx.Interpreter.
warnings.warn(
W0126 11:21:32.310000 15968 D:\Program Files\jasna\torch\export\pt2_archive\_package.py:586] Expect archive file to be a file ending in .pt2, or is a buffer. Instead got {model_weights\lada_mosaic_restoration_model_generic_v1.2_clip10.trt_fp16.win.engine}
Compiling BasicVSR++ model (TensorRT workspace_size=9.42 GB). For large clip length > 100 this can take even few hours.
11:21:37 py.warnings WARNING: torch_tensorrt\dynamo\utils.py:307: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
torch.tensor(inputs),
11:24:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +136, GPU +0, now: CPU 20569, GPU 1330 (MiB)
11:25:04 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:25:15 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:04 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 1 inputs and 1525 output network tensors.
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 676576 bytes
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 5783552 bytes
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 205 steps to complete.
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 25.9975ms to assign 29 blocks to 205 nodes requiring 313214976 bytes.
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 313213952 bytes
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 10785602 bytes
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 303.043 seconds.
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 19 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 7
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 8
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +298, now: CPU 1, GPU 308 (MiB)
11:30:09 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +771, GPU +0, now: CPU 20386, GPU 1654 (MiB)
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3145ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.772236 seconds.
11:30:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 1, GPU 322 (MiB)
11:30:11 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +721, GPU +0, now: CPU 20372, GPU 1668 (MiB)
11:30:12 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:12 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2878ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.783385 seconds.
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 1, GPU 335 (MiB)
11:30:13 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +740, GPU +0, now: CPU 20381, GPU 1682 (MiB)
11:30:14 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2507ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.774809 seconds.
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 2, GPU 348 (MiB)
11:30:16 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +743, GPU +0, now: CPU 20385, GPU 1696 (MiB)
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3185ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.777895 seconds.
11:30:17 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:18 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:18 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:18 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 2, GPU 361 (MiB)
11:30:18 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +724, GPU +0, now: CPU 20410, GPU 1710 (MiB)
11:30:19 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:19 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3223ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.773337 seconds.
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 2, GPU 374 (MiB)
11:30:20 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +750, GPU +0, now: CPU 20404, GPU 1724 (MiB)
11:30:21 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3195ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.792165 seconds.
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 2, GPU 387 (MiB)
11:30:23 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +731, GPU +0, now: CPU 20422, GPU 1738 (MiB)
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2533ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.779684 seconds.
11:30:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 3, GPU 400 (MiB)
11:30:25 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:26 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +693, GPU +0, now: CPU 20422, GPU 1752 (MiB)
11:30:26 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:26 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:26 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3479ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.782922 seconds.
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 3, GPU 413 (MiB)
11:30:27 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +743, GPU +2, now: CPU 20437, GPU 1768 (MiB)
11:30:28 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3585ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.780015 seconds.
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 3, GPU 426 (MiB)
11:30:30 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:30 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +708, GPU +2, now: CPU 20440, GPU 1782 (MiB)
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2333ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.808921 seconds.
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:31 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 4, GPU 440 (MiB)
11:30:32 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:33 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +772, GPU +2, now: CPU 20459, GPU 1796 (MiB)
11:30:33 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:33 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:33 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3337ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.80434 seconds.
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 4, GPU 453 (MiB)
11:30:34 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:35 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +692, GPU +0, now: CPU 20462, GPU 1810 (MiB)
11:30:35 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4044ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.784557 seconds.
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 4, GPU 466 (MiB)
11:30:37 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:37 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +795, GPU +0, now: CPU 20499, GPU 1824 (MiB)
11:30:37 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2568ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.788323 seconds.
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 5, GPU 479 (MiB)
11:30:39 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:40 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +758, GPU +2, now: CPU 20452, GPU 1840 (MiB)
11:30:40 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:40 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:40 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3169ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.790677 seconds.
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 5, GPU 492 (MiB)
11:30:41 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:42 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +756, GPU +2, now: CPU 20455, GPU 1854 (MiB)
11:30:42 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:42 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:42 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2691ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.788309 seconds.
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 5, GPU 505 (MiB)
11:30:43 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:44 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +738, GPU +0, now: CPU 20462, GPU 1868 (MiB)
11:30:44 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2459ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.769873 seconds.
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 6, GPU 518 (MiB)
11:30:46 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:46 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +756, GPU +0, now: CPU 20467, GPU 1882 (MiB)
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2556ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.777338 seconds.
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 6, GPU 531 (MiB)
11:30:48 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +749, GPU +2, now: CPU 20480, GPU 1898 (MiB)
11:30:49 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:49 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:49 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3585ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.792508 seconds.
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 6, GPU 544 (MiB)
11:30:50 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:51 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +743, GPU +2, now: CPU 20507, GPU 1912 (MiB)
11:30:51 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:51 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:51 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2503ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.79676 seconds.
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 7, GPU 558 (MiB)
11:30:52 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:53 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +769, GPU +2, now: CPU 20507, GPU 1926 (MiB)
11:30:53 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2554ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.774476 seconds.
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 7, GPU 571 (MiB)
11:30:55 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:55 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +725, GPU +0, now: CPU 20514, GPU 1940 (MiB)
11:30:55 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2715ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.781695 seconds.
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 7, GPU 584 (MiB)
11:30:57 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:30:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +693, GPU +0, now: CPU 20527, GPU 1954 (MiB)
11:30:58 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:30:58 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:30:58 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3644ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.805517 seconds.
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:30:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 8, GPU 597 (MiB)
11:30:59 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:00 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +763, GPU +2, now: CPU 20535, GPU 1970 (MiB)
11:31:00 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:00 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:00 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2941ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.792818 seconds.
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 8, GPU 610 (MiB)
11:31:01 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +697, GPU +2, now: CPU 20521, GPU 1984 (MiB)
11:31:02 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2579ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.789292 seconds.
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 8, GPU 623 (MiB)
11:31:04 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:04 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +691, GPU +4, now: CPU 20528, GPU 2000 (MiB)
11:31:04 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2564ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.776042 seconds.
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 9, GPU 636 (MiB)
11:31:06 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +695, GPU +2, now: CPU 20537, GPU 2014 (MiB)
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2781ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.822033 seconds.
11:31:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 9, GPU 649 (MiB)
11:31:08 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:09 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +679, GPU +0, now: CPU 20545, GPU 2028 (MiB)
11:31:09 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:09 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:09 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2603ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.799066 seconds.
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 9, GPU 662 (MiB)
11:31:10 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:11 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +690, GPU +2, now: CPU 20557, GPU 2044 (MiB)
11:31:11 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:11 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3496ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.780956 seconds.
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 10, GPU 675 (MiB)
11:31:12 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:13 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +768, GPU +2, now: CPU 20561, GPU 2058 (MiB)
11:31:13 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2855ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.771354 seconds.
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:14 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 10, GPU 689 (MiB)
11:31:15 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +753, GPU +0, now: CPU 20565, GPU 2072 (MiB)
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2545ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.7968 seconds.
11:31:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 10, GPU 702 (MiB)
11:31:17 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:18 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +673, GPU +0, now: CPU 20572, GPU 2086 (MiB)
11:31:18 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:18 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:18 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3281ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.793167 seconds.
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 10, GPU 715 (MiB)
11:31:19 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +751, GPU +2, now: CPU 20573, GPU 2102 (MiB)
11:31:20 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:20 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2619ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.777087 seconds.
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 11, GPU 728 (MiB)
11:31:21 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:22 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +727, GPU +2, now: CPU 20580, GPU 2116 (MiB)
11:31:22 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2674ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.786908 seconds.
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 11, GPU 741 (MiB)
11:31:24 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +642, GPU +2, now: CPU 20589, GPU 2130 (MiB)
11:31:24 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2649ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.808719 seconds.
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 11, GPU 754 (MiB)
11:31:26 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +739, GPU +0, now: CPU 20600, GPU 2144 (MiB)
11:31:27 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:27 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2797ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.826237 seconds.
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 12, GPU 767 (MiB)
11:31:28 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +714, GPU +0, now: CPU 20637, GPU 2158 (MiB)
11:31:29 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:29 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:29 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3608ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.772014 seconds.
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 12, GPU 780 (MiB)
11:31:30 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:31 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +687, GPU +2, now: CPU 20618, GPU 2174 (MiB)
11:31:31 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:31 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3854ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.776168 seconds.
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 12, GPU 793 (MiB)
11:31:32 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:33 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +715, GPU +2, now: CPU 20628, GPU 2188 (MiB)
11:31:33 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.5147ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.789424 seconds.
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 13, GPU 807 (MiB)
11:31:35 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +699, GPU +0, now: CPU 20623, GPU 2202 (MiB)
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2839ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.805197 seconds.
11:31:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 13, GPU 820 (MiB)
11:31:37 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +694, GPU +0, now: CPU 20640, GPU 2216 (MiB)
11:31:38 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:38 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2499ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.777555 seconds.
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 13, GPU 833 (MiB)
11:31:39 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:40 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +693, GPU +2, now: CPU 20642, GPU 2232 (MiB)
11:31:40 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:40 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3825ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.784509 seconds.
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 14, GPU 846 (MiB)
11:31:41 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:42 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +696, GPU +2, now: CPU 20644, GPU 2246 (MiB)
11:31:42 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2558ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.824534 seconds.
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 14, GPU 859 (MiB)
11:31:44 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +694, GPU +2, now: CPU 20649, GPU 2260 (MiB)
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.301ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.820281 seconds.
11:31:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 14, GPU 872 (MiB)
11:31:46 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +763, GPU +0, now: CPU 20659, GPU 2274 (MiB)
11:31:47 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:47 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2405ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.799448 seconds.
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 15, GPU 885 (MiB)
11:31:48 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +701, GPU +0, now: CPU 20674, GPU 2288 (MiB)
11:31:49 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:49 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.374ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.782011 seconds.
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 15, GPU 898 (MiB)
11:31:50 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:51 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +765, GPU +2, now: CPU 20674, GPU 2304 (MiB)
11:31:51 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3684ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.80534 seconds.
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 15, GPU 911 (MiB)
11:31:53 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +746, GPU +2, now: CPU 20683, GPU 2318 (MiB)
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3203ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.806716 seconds.
11:31:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 16, GPU 925 (MiB)
11:31:55 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +687, GPU +0, now: CPU 20686, GPU 2332 (MiB)
11:31:56 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:56 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3158ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.783436 seconds.
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 16, GPU 938 (MiB)
11:31:57 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:31:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +737, GPU +0, now: CPU 20692, GPU 2346 (MiB)
11:31:58 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:31:58 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2993ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.781284 seconds.
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:31:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 16, GPU 951 (MiB)
11:31:59 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:00 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +736, GPU +2, now: CPU 20706, GPU 2362 (MiB)
11:32:00 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2595ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.810191 seconds.
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 17, GPU 964 (MiB)
11:32:02 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +720, GPU +2, now: CPU 20726, GPU 2376 (MiB)
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3529ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.800981 seconds.
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 17, GPU 977 (MiB)
11:32:04 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:05 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +723, GPU +2, now: CPU 20711, GPU 2390 (MiB)
11:32:05 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:05 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:05 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2714ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.79744 seconds.
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 17, GPU 990 (MiB)
11:32:06 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +721, GPU +0, now: CPU 20726, GPU 2404 (MiB)
11:32:07 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:07 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:07 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3019ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.78428 seconds.
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 18, GPU 1003 (MiB)
11:32:08 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:09 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +717, GPU +0, now: CPU 20739, GPU 2418 (MiB)
11:32:09 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:09 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2456ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.789375 seconds.
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 18, GPU 1016 (MiB)
11:32:10 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:11 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +709, GPU +2, now: CPU 20739, GPU 2434 (MiB)
11:32:11 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2771ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.806644 seconds.
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 18, GPU 1029 (MiB)
11:32:13 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +716, GPU +2, now: CPU 20746, GPU 2448 (MiB)
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2667ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.798407 seconds.
11:32:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 18, GPU 1042 (MiB)
11:32:15 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +695, GPU +2, now: CPU 20748, GPU 2462 (MiB)
11:32:16 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:16 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:16 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 7 output network tensors.
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3337ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.782285 seconds.
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 19, GPU 1056 (MiB)
11:32:17 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:18 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +691, GPU +0, now: CPU 20769, GPU 2476 (MiB)
11:32:18 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:18 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:18 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 10 inputs and 6 output network tensors.
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2844ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3255488 bytes
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.799389 seconds.
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 19, GPU 1069 (MiB)
11:32:19 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +712, GPU +0, now: CPU 20764, GPU 2490 (MiB)
11:32:20 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 7 inputs and 8 output network tensors.
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 379968 bytes
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 80 steps to complete.
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.5901ms to assign 5 blocks to 80 nodes requiring 10485760 bytes.
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 6769344 bytes
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.30935 seconds.
11:32:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 7 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 20, GPU 1085 (MiB)
11:32:22 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +686, GPU +2, now: CPU 20779, GPU 2508 (MiB)
11:32:23 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:23 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2676ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.778818 seconds.
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 20, GPU 1098 (MiB)
11:32:24 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +708, GPU +2, now: CPU 20792, GPU 2522 (MiB)
11:32:25 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3608ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.742906 seconds.
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 20, GPU 1111 (MiB)
11:32:26 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +725, GPU +0, now: CPU 20798, GPU 2536 (MiB)
11:32:27 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3037ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.758761 seconds.
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 21, GPU 1125 (MiB)
11:32:28 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +704, GPU +0, now: CPU 20806, GPU 2550 (MiB)
11:32:29 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2526ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.776177 seconds.
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 21, GPU 1138 (MiB)
11:32:31 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:31 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +693, GPU +2, now: CPU 20818, GPU 2566 (MiB)
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3357ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.753887 seconds.
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 21, GPU 1151 (MiB)
11:32:33 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +705, GPU +2, now: CPU 20810, GPU 2582 (MiB)
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3837ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.742253 seconds.
11:32:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 22, GPU 1164 (MiB)
11:32:35 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +676, GPU +2, now: CPU 20819, GPU 2596 (MiB)
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3175ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.746737 seconds.
11:32:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 22, GPU 1177 (MiB)
11:32:37 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +737, GPU +2, now: CPU 20830, GPU 2610 (MiB)
11:32:38 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:38 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3474ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.76344 seconds.
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 22, GPU 1191 (MiB)
11:32:39 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:40 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +723, GPU +0, now: CPU 20834, GPU 2624 (MiB)
11:32:40 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:40 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:40 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2688ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.766495 seconds.
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 23, GPU 1204 (MiB)
11:32:41 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:42 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +747, GPU +0, now: CPU 20834, GPU 2638 (MiB)
11:32:42 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:42 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2585ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.749057 seconds.
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 23, GPU 1217 (MiB)
11:32:43 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:44 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +759, GPU +2, now: CPU 20862, GPU 2654 (MiB)
11:32:44 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3743ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.746284 seconds.
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 23, GPU 1230 (MiB)
11:32:46 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:46 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +763, GPU +2, now: CPU 20845, GPU 2668 (MiB)
11:32:46 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2442ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.78174 seconds.
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 24, GPU 1243 (MiB)
11:32:48 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +728, GPU +2, now: CPU 20854, GPU 2682 (MiB)
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2473ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.759011 seconds.
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 24, GPU 1256 (MiB)
11:32:50 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:51 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +730, GPU +0, now: CPU 20861, GPU 2696 (MiB)
11:32:51 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:51 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:51 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3552ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.78135 seconds.
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 24, GPU 1270 (MiB)
11:32:52 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:53 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +722, GPU +0, now: CPU 20874, GPU 2710 (MiB)
11:32:53 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:53 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:53 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2391ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.759597 seconds.
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 25, GPU 1283 (MiB)
11:32:54 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:55 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +715, GPU +2, now: CPU 20889, GPU 2726 (MiB)
11:32:55 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:55 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:55 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2479ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.760831 seconds.
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 25, GPU 1296 (MiB)
11:32:56 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:57 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +730, GPU +2, now: CPU 20893, GPU 2740 (MiB)
11:32:57 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:32:57 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2548ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.779468 seconds.
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:32:58 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 25, GPU 1309 (MiB)
11:32:58 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:32:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +690, GPU +2, now: CPU 20884, GPU 2754 (MiB)
11:32:59 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.352ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.767588 seconds.
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:00 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 26, GPU 1322 (MiB)
11:33:00 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:01 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +676, GPU +0, now: CPU 20895, GPU 2768 (MiB)
11:33:01 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2929ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.753895 seconds.
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1347 MiB
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 26, GPU 1335 (MiB)
11:33:03 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:03 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +730, GPU +0, now: CPU 20906, GPU 2782 (MiB)
11:33:03 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3304ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.737212 seconds.
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1356 MiB
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 26, GPU 1349 (MiB)
11:33:05 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:05 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +676, GPU +2, now: CPU 20899, GPU 2798 (MiB)
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2707ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.766261 seconds.
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1370 MiB
11:33:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 26, GPU 1362 (MiB)
11:33:07 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +761, GPU +2, now: CPU 20907, GPU 2816 (MiB)
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3447ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.76883 seconds.
11:33:08 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1383 MiB
11:33:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 27, GPU 1375 (MiB)
11:33:09 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +713, GPU +2, now: CPU 20908, GPU 2830 (MiB)
11:33:10 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:10 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2619ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.745195 seconds.
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1396 MiB
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 27, GPU 1388 (MiB)
11:33:11 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +744, GPU +0, now: CPU 20918, GPU 2844 (MiB)
11:33:12 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:12 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2989ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.756001 seconds.
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1409 MiB
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 27, GPU 1401 (MiB)
11:33:13 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +731, GPU +0, now: CPU 20927, GPU 2858 (MiB)
11:33:14 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:14 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2641ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.776097 seconds.
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1422 MiB
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 28, GPU 1415 (MiB)
11:33:15 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +729, GPU +2, now: CPU 20937, GPU 2874 (MiB)
11:33:16 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:16 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2389ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.7718 seconds.
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1436 MiB
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 28, GPU 1428 (MiB)
11:33:17 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:18 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +729, GPU +2, now: CPU 20949, GPU 2888 (MiB)
11:33:18 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:18 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2674ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.762101 seconds.
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1449 MiB
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 28, GPU 1441 (MiB)
11:33:19 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +735, GPU +2, now: CPU 20967, GPU 2902 (MiB)
11:33:20 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2765ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.746998 seconds.
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1462 MiB
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 29, GPU 1454 (MiB)
11:33:21 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:22 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +701, GPU +0, now: CPU 20968, GPU 2916 (MiB)
11:33:22 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3681ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.779276 seconds.
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 1475 MiB
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 29, GPU 1467 (MiB)
11:33:24 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +688, GPU +0, now: CPU 20965, GPU 2930 (MiB)
11:33:24 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4689ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.772938 seconds.
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 30 MiB, GPU 1488 MiB
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 29, GPU 1480 (MiB)
11:33:26 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +743, GPU +0, now: CPU 20972, GPU 2944 (MiB)
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2445ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.776694 seconds.
11:33:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 30 MiB, GPU 1501 MiB
11:33:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 30, GPU 1494 (MiB)
11:33:28 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +692, GPU +2, now: CPU 20980, GPU 2960 (MiB)
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2632ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.750652 seconds.
11:33:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 30 MiB, GPU 1515 MiB
11:33:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 30, GPU 1507 (MiB)
11:33:30 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:31 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +684, GPU +2, now: CPU 20998, GPU 2974 (MiB)
11:33:31 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:31 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:31 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2392ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.754517 seconds.
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 30 MiB, GPU 1528 MiB
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 30, GPU 1520 (MiB)
11:33:32 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:33 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +744, GPU +0, now: CPU 21002, GPU 2988 (MiB)
11:33:33 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:33 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:33 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2384ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.765388 seconds.
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 31 MiB, GPU 1541 MiB
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 31, GPU 1533 (MiB)
11:33:34 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:35 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +768, GPU +0, now: CPU 21008, GPU 3002 (MiB)
11:33:35 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:35 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:35 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2743ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.771732 seconds.
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 31 MiB, GPU 1554 MiB
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 31, GPU 1546 (MiB)
11:33:36 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:37 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +767, GPU +0, now: CPU 21021, GPU 3016 (MiB)
11:33:37 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:37 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2453ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.747778 seconds.
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 31 MiB, GPU 1567 MiB
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 31, GPU 1560 (MiB)
11:33:38 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:39 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +721, GPU +2, now: CPU 21029, GPU 3032 (MiB)
11:33:39 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:39 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2385ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.761317 seconds.
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 32 MiB, GPU 1581 MiB
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:40 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 32, GPU 1573 (MiB)
11:33:40 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:41 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +753, GPU +2, now: CPU 21049, GPU 3048 (MiB)
11:33:41 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3186ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.77937 seconds.
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 32 MiB, GPU 1594 MiB
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 32, GPU 1586 (MiB)
11:33:42 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +751, GPU +4, now: CPU 21051, GPU 3064 (MiB)
11:33:43 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3989ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.766876 seconds.
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 32 MiB, GPU 1607 MiB
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 32, GPU 1599 (MiB)
11:33:45 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +724, GPU +0, now: CPU 21051, GPU 3078 (MiB)
11:33:45 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4749ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.750972 seconds.
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 33 MiB, GPU 1620 MiB
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 33, GPU 1612 (MiB)
11:33:47 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +721, GPU +0, now: CPU 21061, GPU 3092 (MiB)
11:33:47 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2478ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.752388 seconds.
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 33 MiB, GPU 1633 MiB
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 33, GPU 1625 (MiB)
11:33:49 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +760, GPU +2, now: CPU 21077, GPU 3108 (MiB)
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2625ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.755423 seconds.
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 33 MiB, GPU 1646 MiB
11:33:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 33, GPU 1639 (MiB)
11:33:51 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +706, GPU +2, now: CPU 21049, GPU 3122 (MiB)
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2871ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.770516 seconds.
11:33:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 34 MiB, GPU 1660 MiB
11:33:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 34, GPU 1652 (MiB)
11:33:53 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +686, GPU +2, now: CPU 21063, GPU 3136 (MiB)
11:33:54 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:54 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3223ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.762317 seconds.
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 34 MiB, GPU 1673 MiB
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 34, GPU 1665 (MiB)
11:33:55 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +737, GPU +0, now: CPU 21066, GPU 3150 (MiB)
11:33:56 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:56 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3081ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.737436 seconds.
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 34 MiB, GPU 1686 MiB
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 34, GPU 1678 (MiB)
11:33:57 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:33:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +686, GPU +0, now: CPU 21072, GPU 3164 (MiB)
11:33:58 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:33:58 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:33:58 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3577ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.762681 seconds.
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 35 MiB, GPU 1699 MiB
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:33:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 34, GPU 1691 (MiB)
11:33:59 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:00 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +684, GPU +2, now: CPU 21082, GPU 3180 (MiB)
11:34:00 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:00 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:00 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3367ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.755612 seconds.
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 35 MiB, GPU 1712 MiB
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 35, GPU 1704 (MiB)
11:34:01 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +681, GPU +2, now: CPU 21088, GPU 3194 (MiB)
11:34:02 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:02 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:02 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2982ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.744209 seconds.
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 35 MiB, GPU 1725 MiB
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 35, GPU 1718 (MiB)
11:34:03 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:04 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +784, GPU +2, now: CPU 21128, GPU 3208 (MiB)
11:34:04 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:04 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3722ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.740475 seconds.
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 36 MiB, GPU 1739 MiB
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 35, GPU 1731 (MiB)
11:34:05 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:06 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +757, GPU +0, now: CPU 21100, GPU 3222 (MiB)
11:34:06 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4315ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.740235 seconds.
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 36 MiB, GPU 1752 MiB
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 36, GPU 1744 (MiB)
11:34:07 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:08 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +704, GPU +0, now: CPU 21108, GPU 3236 (MiB)
11:34:08 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:08 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2446ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.759288 seconds.
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 36 MiB, GPU 1765 MiB
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 36, GPU 1757 (MiB)
11:34:09 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +707, GPU +2, now: CPU 21109, GPU 3252 (MiB)
11:34:10 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2689ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.747775 seconds.
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 37 MiB, GPU 1778 MiB
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 36, GPU 1770 (MiB)
11:34:11 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +755, GPU +2, now: CPU 21114, GPU 3266 (MiB)
11:34:12 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2746ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.768565 seconds.
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 37 MiB, GPU 1791 MiB
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 37, GPU 1784 (MiB)
11:34:14 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +726, GPU +2, now: CPU 21143, GPU 3284 (MiB)
11:34:14 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3471ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.748818 seconds.
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 37 MiB, GPU 1805 MiB
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 37, GPU 1797 (MiB)
11:34:16 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +743, GPU +0, now: CPU 21173, GPU 3298 (MiB)
11:34:16 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2763ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.743119 seconds.
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 38 MiB, GPU 1818 MiB
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 37, GPU 1810 (MiB)
11:34:18 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:18 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +763, GPU +0, now: CPU 21147, GPU 3312 (MiB)
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.5116ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.770793 seconds.
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 38 MiB, GPU 1831 MiB
11:34:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 38, GPU 1823 (MiB)
11:34:20 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +737, GPU +0, now: CPU 21174, GPU 3326 (MiB)
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 7 output network tensors.
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2816ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.773257 seconds.
11:34:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 38 MiB, GPU 1844 MiB
11:34:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 38, GPU 1836 (MiB)
11:34:22 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +699, GPU +2, now: CPU 21156, GPU 3342 (MiB)
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 11 inputs and 6 output network tensors.
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202416 bytes
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2727ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3329216 bytes
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.74992 seconds.
11:34:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 39 MiB, GPU 1856 MiB
11:34:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 38, GPU 1849 (MiB)
11:34:24 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +764, GPU +2, now: CPU 21165, GPU 3356 (MiB)
11:34:25 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 8 inputs and 8 output network tensors.
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 379968 bytes
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 81 steps to complete.
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.5708ms to assign 6 blocks to 81 nodes requiring 10486272 bytes.
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 7441088 bytes
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.2799 seconds.
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 41 MiB, GPU 1868 MiB
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:26 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 8 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 39, GPU 1866 (MiB)
11:34:27 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +679, GPU +0, now: CPU 21183, GPU 3374 (MiB)
11:34:27 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3692ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.806101 seconds.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 41 MiB, GPU 1887 MiB
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 39, GPU 1880 (MiB)
11:34:29 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +721, GPU +0, now: CPU 21196, GPU 3388 (MiB)
11:34:29 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2697ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.802252 seconds.
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 41 MiB, GPU 1901 MiB
11:34:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:31 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:31 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 40, GPU 1893 (MiB)
11:34:31 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +762, GPU +2, now: CPU 21197, GPU 3404 (MiB)
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2907ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.776578 seconds.
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 41 MiB, GPU 1914 MiB
11:34:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 40, GPU 1906 (MiB)
11:34:33 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +701, GPU +2, now: CPU 21204, GPU 3418 (MiB)
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3282ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.77301 seconds.
11:34:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 41 MiB, GPU 1927 MiB
11:34:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 40, GPU 1919 (MiB)
11:34:35 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +695, GPU +2, now: CPU 21207, GPU 3432 (MiB)
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2528ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.784786 seconds.
11:34:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 41 MiB, GPU 1940 MiB
11:34:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 41, GPU 1933 (MiB)
11:34:37 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +671, GPU +0, now: CPU 21224, GPU 3446 (MiB)
11:34:38 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:38 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:38 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2814ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.802234 seconds.
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 41 MiB, GPU 1954 MiB
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 41, GPU 1946 (MiB)
11:34:39 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:40 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +650, GPU +0, now: CPU 21226, GPU 3460 (MiB)
11:34:40 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:40 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:40 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:40 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3107ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.780278 seconds.
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 41 MiB, GPU 1967 MiB
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 41, GPU 1959 (MiB)
11:34:41 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:42 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +658, GPU +2, now: CPU 21234, GPU 3476 (MiB)
11:34:42 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:42 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:42 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:42 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2844ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.781316 seconds.
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 42 MiB, GPU 1980 MiB
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 42, GPU 1972 (MiB)
11:34:43 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:44 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +744, GPU +2, now: CPU 21241, GPU 3490 (MiB)
11:34:44 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:44 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:44 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:44 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4865ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.775158 seconds.
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 42 MiB, GPU 1993 MiB
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 42, GPU 1986 (MiB)
11:34:45 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:46 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +759, GPU +0, now: CPU 21252, GPU 3504 (MiB)
11:34:46 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:46 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:46 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:46 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4417ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.807132 seconds.
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 42 MiB, GPU 2007 MiB
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 42, GPU 1999 (MiB)
11:34:47 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:48 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +683, GPU +0, now: CPU 21264, GPU 3520 (MiB)
11:34:48 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:48 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:48 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:48 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3162ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.788939 seconds.
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 43 MiB, GPU 2020 MiB
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 42, GPU 2012 (MiB)
11:34:49 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:50 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +752, GPU +2, now: CPU 21276, GPU 3536 (MiB)
11:34:50 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:50 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:50 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:50 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.441ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.780624 seconds.
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 43 MiB, GPU 2033 MiB
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 43, GPU 2025 (MiB)
11:34:51 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +745, GPU +2, now: CPU 21275, GPU 3550 (MiB)
11:34:52 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2633ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.781615 seconds.
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 43 MiB, GPU 2046 MiB
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 43, GPU 2039 (MiB)
11:34:53 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +671, GPU +2, now: CPU 21291, GPU 3564 (MiB)
11:34:54 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2908ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.821516 seconds.
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 44 MiB, GPU 2060 MiB
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 43, GPU 2052 (MiB)
11:34:55 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +731, GPU +2, now: CPU 21298, GPU 3580 (MiB)
11:34:56 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2534ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.806626 seconds.
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 44 MiB, GPU 2073 MiB
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 44, GPU 2065 (MiB)
11:34:57 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:34:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +723, GPU +0, now: CPU 21306, GPU 3594 (MiB)
11:34:58 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3382ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.80535 seconds.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 44 MiB, GPU 2086 MiB
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:34:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 44, GPU 2078 (MiB)
11:34:59 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:00 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +698, GPU +2, now: CPU 21325, GPU 3610 (MiB)
11:35:00 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4009ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.800396 seconds.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 45 MiB, GPU 2099 MiB
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 44, GPU 2092 (MiB)
11:35:02 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +709, GPU +2, now: CPU 21318, GPU 3624 (MiB)
11:35:02 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3443ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.797801 seconds.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 45 MiB, GPU 2113 MiB
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 45, GPU 2105 (MiB)
11:35:04 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:04 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +711, GPU +2, now: CPU 21331, GPU 3638 (MiB)
11:35:04 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3186ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.829447 seconds.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 45 MiB, GPU 2126 MiB
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 45, GPU 2118 (MiB)
11:35:06 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +784, GPU +0, now: CPU 21602, GPU 3652 (MiB)
11:35:07 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:07 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:07 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:07 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.383ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.02068 seconds.
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 46 MiB, GPU 2139 MiB
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 45, GPU 2131 (MiB)
11:35:08 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:09 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +399, GPU +0, now: CPU 21674, GPU 3666 (MiB)
11:35:09 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4361ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.910561 seconds.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 46 MiB, GPU 2152 MiB
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 46, GPU 2145 (MiB)
11:35:11 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:11 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +693, GPU +2, now: CPU 21660, GPU 3682 (MiB)
11:35:11 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4101ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.872268 seconds.
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 46 MiB, GPU 2166 MiB
11:35:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 46, GPU 2158 (MiB)
11:35:13 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +647, GPU +2, now: CPU 21667, GPU 3696 (MiB)
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2506ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.837217 seconds.
11:35:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 46 MiB, GPU 2179 MiB
11:35:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 46, GPU 2171 (MiB)
11:35:15 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +645, GPU +0, now: CPU 21685, GPU 3710 (MiB)
11:35:16 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:16 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:16 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:16 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3742ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.809133 seconds.
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 47 MiB, GPU 2192 MiB
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 47, GPU 2184 (MiB)
11:35:17 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:18 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +683, GPU +0, now: CPU 21697, GPU 3724 (MiB)
11:35:18 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:18 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:18 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:18 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2573ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.795466 seconds.
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 47 MiB, GPU 2205 MiB
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 47, GPU 2198 (MiB)
11:35:19 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +715, GPU +2, now: CPU 21702, GPU 3740 (MiB)
11:35:20 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:20 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:20 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:20 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2532ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.774385 seconds.
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 47 MiB, GPU 2219 MiB
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 47, GPU 2211 (MiB)
11:35:21 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:22 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +744, GPU +2, now: CPU 21710, GPU 3758 (MiB)
11:35:22 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:22 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:22 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:22 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2699ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.801723 seconds.
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 2232 MiB
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 48, GPU 2224 (MiB)
11:35:23 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +689, GPU +2, now: CPU 21720, GPU 3772 (MiB)
11:35:24 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:24 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:24 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3218ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.801397 seconds.
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 2245 MiB
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 48, GPU 2237 (MiB)
11:35:25 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:26 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +746, GPU +0, now: CPU 21732, GPU 3786 (MiB)
11:35:26 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:26 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2701ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.79552 seconds.
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 2258 MiB
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 48, GPU 2251 (MiB)
11:35:27 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +736, GPU +0, now: CPU 21736, GPU 3800 (MiB)
11:35:28 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:28 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.276ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.770373 seconds.
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 49 MiB, GPU 2272 MiB
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 49, GPU 2264 (MiB)
11:35:29 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:30 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +697, GPU +2, now: CPU 21741, GPU 3816 (MiB)
11:35:30 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3914ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.800709 seconds.
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 49 MiB, GPU 2285 MiB
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:31 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 49, GPU 2277 (MiB)
11:35:31 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:32 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +702, GPU +2, now: CPU 21733, GPU 3830 (MiB)
11:35:32 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2616ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.812707 seconds.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 49 MiB, GPU 2298 MiB
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 49, GPU 2290 (MiB)
11:35:33 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +713, GPU +0, now: CPU 21735, GPU 3844 (MiB)
11:35:34 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4313ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.792615 seconds.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 50 MiB, GPU 2311 MiB
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 50, GPU 2304 (MiB)
11:35:36 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +757, GPU +0, now: CPU 21745, GPU 3858 (MiB)
11:35:36 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.42ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.801002 seconds.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 50 MiB, GPU 2325 MiB
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 50, GPU 2317 (MiB)
11:35:38 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +689, GPU +2, now: CPU 21754, GPU 3874 (MiB)
11:35:38 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3915ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.78126 seconds.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 50 MiB, GPU 2338 MiB
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 50, GPU 2330 (MiB)
11:35:40 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:40 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +712, GPU +2, now: CPU 21761, GPU 3888 (MiB)
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3056ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.796555 seconds.
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 51 MiB, GPU 2351 MiB
11:35:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 50, GPU 2343 (MiB)
11:35:42 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +671, GPU +2, now: CPU 21774, GPU 3902 (MiB)
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2643ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.815333 seconds.
11:35:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 51 MiB, GPU 2364 MiB
11:35:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 51, GPU 2357 (MiB)
11:35:44 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +671, GPU +0, now: CPU 21782, GPU 3916 (MiB)
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3319ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.792801 seconds.
11:35:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 51 MiB, GPU 2378 MiB
11:35:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 51, GPU 2370 (MiB)
11:35:46 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +701, GPU +0, now: CPU 21789, GPU 3930 (MiB)
11:35:47 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:47 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:47 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4095ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.792316 seconds.
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 52 MiB, GPU 2391 MiB
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 51, GPU 2383 (MiB)
11:35:48 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +787, GPU +2, now: CPU 21828, GPU 3946 (MiB)
11:35:49 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:49 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:49 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:49 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3116ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.793936 seconds.
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 52 MiB, GPU 2404 MiB
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 52, GPU 2396 (MiB)
11:35:50 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:51 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +770, GPU +2, now: CPU 21818, GPU 3960 (MiB)
11:35:51 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:51 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:51 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:51 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.412ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.807368 seconds.
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 52 MiB, GPU 2417 MiB
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 52, GPU 2410 (MiB)
11:35:52 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:53 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +721, GPU +0, now: CPU 21831, GPU 3974 (MiB)
11:35:53 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:53 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:53 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:53 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2989ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.798048 seconds.
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 53 MiB, GPU 2431 MiB
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 52, GPU 2423 (MiB)
11:35:54 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:55 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +767, GPU +0, now: CPU 21857, GPU 3990 (MiB)
11:35:55 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:55 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:55 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:55 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4283ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.772758 seconds.
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 53 MiB, GPU 2444 MiB
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 53, GPU 2436 (MiB)
11:35:56 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:57 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +747, GPU +0, now: CPU 21837, GPU 4004 (MiB)
11:35:57 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:57 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2676ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.786534 seconds.
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 53 MiB, GPU 2457 MiB
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:35:58 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 53, GPU 2449 (MiB)
11:35:58 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:35:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +711, GPU +2, now: CPU 21840, GPU 4020 (MiB)
11:35:59 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:35:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.4089ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.791999 seconds.
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 54 MiB, GPU 2470 MiB
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:00 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 53, GPU 2463 (MiB)
11:36:00 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:01 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +719, GPU +2, now: CPU 21857, GPU 4034 (MiB)
11:36:01 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.374ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.789329 seconds.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 54 MiB, GPU 2484 MiB
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 54, GPU 2476 (MiB)
11:36:02 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:03 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +720, GPU +0, now: CPU 21865, GPU 4048 (MiB)
11:36:03 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2504ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.773563 seconds.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 54 MiB, GPU 2497 MiB
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 54, GPU 2489 (MiB)
11:36:04 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:05 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +687, GPU +0, now: CPU 21872, GPU 4062 (MiB)
11:36:05 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2474ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.793052 seconds.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 54 MiB, GPU 2510 MiB
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 54, GPU 2502 (MiB)
11:36:06 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +699, GPU +2, now: CPU 21880, GPU 4078 (MiB)
11:36:07 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3195ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.852992 seconds.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 55 MiB, GPU 2523 MiB
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:08 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 55, GPU 2516 (MiB)
11:36:09 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:09 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +759, GPU +2, now: CPU 21918, GPU 4092 (MiB)
11:36:09 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3688ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.798494 seconds.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 55 MiB, GPU 2537 MiB
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 55, GPU 2529 (MiB)
11:36:11 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:11 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +674, GPU +2, now: CPU 21899, GPU 4106 (MiB)
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3289ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.787708 seconds.
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 55 MiB, GPU 2550 MiB
11:36:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:13 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 55, GPU 2542 (MiB)
11:36:13 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +681, GPU +0, now: CPU 21919, GPU 4120 (MiB)
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3497ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.806489 seconds.
11:36:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 56 MiB, GPU 2563 MiB
11:36:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 56, GPU 2555 (MiB)
11:36:15 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +722, GPU +0, now: CPU 21926, GPU 4134 (MiB)
11:36:16 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:16 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:16 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:16 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2716ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.820182 seconds.
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 56 MiB, GPU 2576 MiB
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 56, GPU 2569 (MiB)
11:36:17 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:18 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +636, GPU +2, now: CPU 21960, GPU 4150 (MiB)
11:36:18 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:18 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:18 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:18 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3219ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.815865 seconds.
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 56 MiB, GPU 2590 MiB
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 56, GPU 2582 (MiB)
11:36:19 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +617, GPU +4, now: CPU 21924, GPU 4166 (MiB)
11:36:20 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:20 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:20 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:20 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2855ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.815258 seconds.
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 57 MiB, GPU 2603 MiB
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 57, GPU 2595 (MiB)
11:36:21 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:22 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +735, GPU +0, now: CPU 21951, GPU 4180 (MiB)
11:36:22 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:22 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3113ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.778967 seconds.
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 57 MiB, GPU 2616 MiB
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 57, GPU 2608 (MiB)
11:36:23 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +743, GPU +0, now: CPU 21943, GPU 4194 (MiB)
11:36:24 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 13 inputs and 7 output network tensors.
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.3611ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.776116 seconds.
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 57 MiB, GPU 2629 MiB
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +10, now: CPU 57, GPU 2622 (MiB)
11:36:25 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:26 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +750, GPU +2, now: CPU 21949, GPU 4210 (MiB)
11:36:26 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 12 inputs and 6 output network tensors.
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 202064 bytes
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 46 steps to complete.
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.2705ms to assign 6 blocks to 46 nodes requiring 10486272 bytes.
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 10485760 bytes
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 3402944 bytes
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 0.777158 seconds.
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 58 MiB, GPU 2642 MiB
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 4 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +10, now: CPU 58, GPU 2635 (MiB)
11:36:27 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +689, GPU +2, now: CPU 21962, GPU 4228 (MiB)
11:36:28 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:29 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:29 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 10 inputs and 6 output network tensors.
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 465008 bytes
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 108 steps to complete.
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.3674ms to assign 7 blocks to 108 nodes requiring 22052864 bytes.
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 10556224 bytes
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.71404 seconds.
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2654 MiB
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 11 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 58, GPU 2666 (MiB)
11:36:31 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:31 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +732, GPU +2, now: CPU 21993, GPU 4262 (MiB)
11:36:31 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:32 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:32 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:32 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.036ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17449 seconds.
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2685 MiB
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 59, GPU 2692 (MiB)
11:36:33 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +665, GPU +0, now: CPU 22001, GPU 4290 (MiB)
11:36:34 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9036ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.1844 seconds.
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2711 MiB
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 59, GPU 2718 (MiB)
11:36:36 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:36 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +752, GPU +2, now: CPU 22020, GPU 4320 (MiB)
11:36:37 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:37 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:37 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:37 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.0064ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18526 seconds.
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2737 MiB
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 60, GPU 2744 (MiB)
11:36:38 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:39 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +690, GPU +2, now: CPU 22047, GPU 4348 (MiB)
11:36:39 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:39 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8431ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.1751 seconds.
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2763 MiB
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:40 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 60, GPU 2770 (MiB)
11:36:41 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:42 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +746, GPU +0, now: CPU 22058, GPU 4376 (MiB)
11:36:42 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:42 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:42 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:42 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9498ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.20883 seconds.
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2789 MiB
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 61, GPU 2796 (MiB)
11:36:43 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:44 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +635, GPU +0, now: CPU 22074, GPU 4404 (MiB)
11:36:44 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:44 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9086ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17568 seconds.
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2815 MiB
11:36:45 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 61, GPU 2822 (MiB)
11:36:46 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +734, GPU +2, now: CPU 22072, GPU 4434 (MiB)
11:36:47 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:47 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:47 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.957ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.19627 seconds.
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2841 MiB
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:48 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 61, GPU 2848 (MiB)
11:36:48 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +671, GPU +2, now: CPU 22091, GPU 4462 (MiB)
11:36:49 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.997ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.15277 seconds.
11:36:50 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2867 MiB
11:36:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 62, GPU 2874 (MiB)
11:36:51 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +699, GPU +0, now: CPU 22096, GPU 4490 (MiB)
11:36:52 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:52 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:52 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.004ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.1706 seconds.
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 2893 MiB
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:53 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 62, GPU 2900 (MiB)
11:36:53 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +765, GPU +2, now: CPU 22137, GPU 4520 (MiB)
11:36:54 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:55 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:55 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:55 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9831ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.21686 seconds.
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 63 MiB, GPU 2919 MiB
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:56 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 63, GPU 2926 (MiB)
11:36:56 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:57 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +742, GPU +2, now: CPU 22134, GPU 4548 (MiB)
11:36:57 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:36:57 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.876ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17543 seconds.
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 63 MiB, GPU 2945 MiB
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:36:58 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +22, now: CPU 63, GPU 2953 (MiB)
11:36:59 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:36:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +749, GPU +0, now: CPU 22145, GPU 4576 (MiB)
11:37:00 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:00 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:00 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:00 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9271ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.19795 seconds.
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 64 MiB, GPU 2972 MiB
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:01 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 64, GPU 2979 (MiB)
11:37:01 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +658, GPU +0, now: CPU 22157, GPU 4604 (MiB)
11:37:02 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:02 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.1465ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.24391 seconds.
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 64 MiB, GPU 2998 MiB
11:37:03 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:04 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 64, GPU 3005 (MiB)
11:37:04 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:05 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +757, GPU +2, now: CPU 22165, GPU 4634 (MiB)
11:37:05 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:05 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:05 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:05 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9022ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.24343 seconds.
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 65 MiB, GPU 3024 MiB
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:06 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 65, GPU 3031 (MiB)
11:37:06 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +692, GPU +2, now: CPU 22188, GPU 4662 (MiB)
11:37:07 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:08 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:08 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:08 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:08 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:08 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9465ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:08 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:09 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:09 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:09 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.19247 seconds.
11:37:09 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 65 MiB, GPU 3050 MiB
11:37:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:09 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 65, GPU 3057 (MiB)
11:37:09 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +712, GPU +0, now: CPU 22196, GPU 4692 (MiB)
11:37:10 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.1846ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17277 seconds.
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 66 MiB, GPU 3076 MiB
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:11 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 66, GPU 3083 (MiB)
11:37:12 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +644, GPU +0, now: CPU 22211, GPU 4720 (MiB)
11:37:12 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:13 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:13 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:13 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8921ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.22135 seconds.
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 66 MiB, GPU 3102 MiB
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:14 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 66, GPU 3109 (MiB)
11:37:14 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:15 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +641, GPU +2, now: CPU 22234, GPU 4750 (MiB)
11:37:15 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:15 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.976ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.2075 seconds.
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 67 MiB, GPU 3128 MiB
11:37:16 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 66, GPU 3135 (MiB)
11:37:17 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:18 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +657, GPU +2, now: CPU 22242, GPU 4778 (MiB)
11:37:18 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:18 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:18 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:18 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9288ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.15463 seconds.
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 67 MiB, GPU 3154 MiB
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:19 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 67, GPU 3161 (MiB)
11:37:19 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +683, GPU +0, now: CPU 22241, GPU 4806 (MiB)
11:37:20 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:20 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.1027ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17118 seconds.
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 67 MiB, GPU 3180 MiB
11:37:21 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:22 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 67, GPU 3187 (MiB)
11:37:22 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:23 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +641, GPU +2, now: CPU 22253, GPU 4836 (MiB)
11:37:23 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:23 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:23 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:23 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9737ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18777 seconds.
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 68 MiB, GPU 3206 MiB
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:24 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 68, GPU 3213 (MiB)
11:37:24 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +704, GPU +2, now: CPU 22269, GPU 4864 (MiB)
11:37:25 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.2039ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.14868 seconds.
11:37:26 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 68 MiB, GPU 3232 MiB
11:37:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:27 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 68, GPU 3239 (MiB)
11:37:27 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +738, GPU +0, now: CPU 22278, GPU 4892 (MiB)
11:37:28 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:28 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:28 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:28 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9417ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.14767 seconds.
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 69 MiB, GPU 3258 MiB
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:29 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 69, GPU 3265 (MiB)
11:37:29 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:30 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +660, GPU +0, now: CPU 22292, GPU 4920 (MiB)
11:37:30 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9035ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18571 seconds.
11:37:31 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 69 MiB, GPU 3284 MiB
11:37:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:32 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 69, GPU 3291 (MiB)
11:37:32 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:33 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +642, GPU +2, now: CPU 22307, GPU 4950 (MiB)
11:37:33 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:33 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9784ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18256 seconds.
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 70 MiB, GPU 3310 MiB
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:34 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 70, GPU 3317 (MiB)
11:37:34 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:35 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +666, GPU +2, now: CPU 22318, GPU 4978 (MiB)
11:37:35 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:36 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:36 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:36 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9248ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17373 seconds.
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 70 MiB, GPU 3336 MiB
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:37 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 70, GPU 3343 (MiB)
11:37:37 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +683, GPU +0, now: CPU 22328, GPU 5006 (MiB)
11:37:38 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.0803ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.16226 seconds.
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 71 MiB, GPU 3362 MiB
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:39 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 71, GPU 3369 (MiB)
11:37:40 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:40 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +645, GPU +0, now: CPU 22338, GPU 5034 (MiB)
11:37:40 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:41 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:41 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:41 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8659ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.20222 seconds.
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 71 MiB, GPU 3388 MiB
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:42 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 71, GPU 3395 (MiB)
11:37:42 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +713, GPU +2, now: CPU 22358, GPU 5064 (MiB)
11:37:43 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9493ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.1768 seconds.
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 72 MiB, GPU 3414 MiB
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:44 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 72, GPU 3421 (MiB)
11:37:45 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +766, GPU +2, now: CPU 22373, GPU 5092 (MiB)
11:37:46 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:46 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:46 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:46 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.0537ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.15499 seconds.
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 72 MiB, GPU 3440 MiB
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:47 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 72, GPU 3448 (MiB)
11:37:47 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:48 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +670, GPU +0, now: CPU 22375, GPU 5120 (MiB)
11:37:48 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:48 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8779ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.2353 seconds.
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 72 MiB, GPU 3467 MiB
11:37:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:50 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 72, GPU 3474 (MiB)
11:37:50 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:51 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +653, GPU +2, now: CPU 22387, GPU 5154 (MiB)
11:37:51 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:51 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:51 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:51 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8734ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18723 seconds.
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 73 MiB, GPU 3493 MiB
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:52 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 73, GPU 3500 (MiB)
11:37:52 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:53 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +726, GPU +2, now: CPU 22454, GPU 5182 (MiB)
11:37:53 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:53 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9913ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.16901 seconds.
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 73 MiB, GPU 3519 MiB
11:37:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:55 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 73, GPU 3526 (MiB)
11:37:55 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +696, GPU +2, now: CPU 22437, GPU 5212 (MiB)
11:37:56 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:56 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:56 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9282ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.16349 seconds.
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 74 MiB, GPU 3545 MiB
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:37:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 74, GPU 3552 (MiB)
11:37:57 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:37:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +669, GPU +2, now: CPU 22467, GPU 5240 (MiB)
11:37:58 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9102ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18309 seconds.
11:37:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 74 MiB, GPU 3571 MiB
11:38:00 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:00 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:00 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 74, GPU 3578 (MiB)
11:38:00 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:01 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +654, GPU +2, now: CPU 22451, GPU 5270 (MiB)
11:38:01 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:01 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:01 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:01 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.0152ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17419 seconds.
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 75 MiB, GPU 3597 MiB
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:02 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 75, GPU 3604 (MiB)
11:38:02 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:03 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +654, GPU +2, now: CPU 22479, GPU 5298 (MiB)
11:38:03 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:04 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:04 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:04 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9839ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.15909 seconds.
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 75 MiB, GPU 3623 MiB
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:05 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 75, GPU 3630 (MiB)
11:38:05 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:06 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +653, GPU +0, now: CPU 22467, GPU 5326 (MiB)
11:38:06 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:06 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9677ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18144 seconds.
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 76 MiB, GPU 3649 MiB
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 76, GPU 3656 (MiB)
11:38:08 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:08 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +657, GPU +2, now: CPU 22482, GPU 5356 (MiB)
11:38:08 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:09 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:09 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:09 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.0062ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18165 seconds.
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 76 MiB, GPU 3675 MiB
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:10 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 76, GPU 3682 (MiB)
11:38:10 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:11 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +726, GPU +2, now: CPU 22526, GPU 5384 (MiB)
11:38:11 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:11 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8918ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17225 seconds.
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 77 MiB, GPU 3701 MiB
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:12 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 77, GPU 3708 (MiB)
11:38:13 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:13 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +737, GPU +0, now: CPU 22512, GPU 5412 (MiB)
11:38:14 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:14 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:14 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:14 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9287ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.1435 seconds.
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 77 MiB, GPU 3727 MiB
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:15 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 77, GPU 3734 (MiB)
11:38:15 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:16 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +675, GPU +0, now: CPU 22522, GPU 5440 (MiB)
11:38:16 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:16 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9564ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18917 seconds.
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 77 MiB, GPU 3753 MiB
11:38:17 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:18 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:18 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 77, GPU 3760 (MiB)
11:38:18 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:19 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +702, GPU +2, now: CPU 22523, GPU 5470 (MiB)
11:38:19 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:19 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:19 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:19 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9518ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.19079 seconds.
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 78 MiB, GPU 3779 MiB
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:20 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 78, GPU 3786 (MiB)
11:38:20 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:21 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +656, GPU +2, now: CPU 22540, GPU 5498 (MiB)
11:38:21 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:21 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.966ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17571 seconds.
11:38:22 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 78 MiB, GPU 3805 MiB
11:38:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:23 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 78, GPU 3812 (MiB)
11:38:23 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:24 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +702, GPU +0, now: CPU 22555, GPU 5526 (MiB)
11:38:24 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:24 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:24 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:24 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9176ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18119 seconds.
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 79 MiB, GPU 3831 MiB
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:25 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 79, GPU 3838 (MiB)
11:38:25 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:26 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +679, GPU +0, now: CPU 22573, GPU 5554 (MiB)
11:38:26 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:27 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:27 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:27 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.2195ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.22356 seconds.
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 79 MiB, GPU 3857 MiB
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:28 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 79, GPU 3864 (MiB)
11:38:28 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:29 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +716, GPU +2, now: CPU 22598, GPU 5584 (MiB)
11:38:29 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:29 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9023ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.18565 seconds.
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 80 MiB, GPU 3883 MiB
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:30 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 80, GPU 3890 (MiB)
11:38:31 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:31 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +634, GPU +2, now: CPU 22626, GPU 5614 (MiB)
11:38:32 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:32 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:32 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:32 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9358ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.16656 seconds.
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 80 MiB, GPU 3909 MiB
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:33 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +22, now: CPU 80, GPU 3917 (MiB)
11:38:33 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:34 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +681, GPU +0, now: CPU 22616, GPU 5642 (MiB)
11:38:34 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:34 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.1982ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.21197 seconds.
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 81 MiB, GPU 3936 MiB
11:38:35 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:36 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 81, GPU 3943 (MiB)
11:38:36 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:37 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +680, GPU +2, now: CPU 22624, GPU 5672 (MiB)
11:38:37 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:37 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:37 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:37 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9712ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.21286 seconds.
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 81 MiB, GPU 3962 MiB
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:38 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 81, GPU 3969 (MiB)
11:38:38 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:39 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +669, GPU +2, now: CPU 22637, GPU 5700 (MiB)
11:38:39 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:40 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:40 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:40 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.891ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.16772 seconds.
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 82 MiB, GPU 3988 MiB
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:41 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 82, GPU 3995 (MiB)
11:38:41 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:42 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +712, GPU +0, now: CPU 22650, GPU 5728 (MiB)
11:38:42 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:42 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.1571ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.21077 seconds.
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 82 MiB, GPU 4014 MiB
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:43 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 82, GPU 4021 (MiB)
11:38:44 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:45 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +686, GPU +0, now: CPU 22671, GPU 5756 (MiB)
11:38:45 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:45 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:45 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:45 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.8734ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.23482 seconds.
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 82 MiB, GPU 4040 MiB
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:46 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 82, GPU 4047 (MiB)
11:38:46 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:47 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +708, GPU +2, now: CPU 22682, GPU 5786 (MiB)
11:38:47 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:48 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:48 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:48 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.855ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.19669 seconds.
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 83 MiB, GPU 4066 MiB
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:49 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 83, GPU 4073 (MiB)
11:38:49 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:50 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +641, GPU +2, now: CPU 22709, GPU 5814 (MiB)
11:38:50 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:50 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.068ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17987 seconds.
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 83 MiB, GPU 4092 MiB
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:51 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 83, GPU 4099 (MiB)
11:38:52 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:53 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +675, GPU +0, now: CPU 22730, GPU 5842 (MiB)
11:38:53 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:53 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:53 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:53 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9106ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.17217 seconds.
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 84 MiB, GPU 4118 MiB
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:54 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 84, GPU 4125 (MiB)
11:38:54 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:55 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +671, GPU +0, now: CPU 22728, GPU 5870 (MiB)
11:38:55 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 15 inputs and 6 output network tensors.
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 76 steps to complete.
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.9462ms to assign 8 blocks to 76 nodes requiring 22053376 bytes.
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.15496 seconds.
11:38:56 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 84 MiB, GPU 4144 MiB
11:38:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:57 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 84, GPU 4151 (MiB)
11:38:57 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:38:58 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +694, GPU +2, now: CPU 22764, GPU 5900 (MiB)
11:38:58 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:38:58 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine build.
11:38:58 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:38:58 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 14 inputs and 4 output network tensors.
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 287424 bytes
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 1572864 bytes
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 77 steps to complete.
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 1.0881ms to assign 8 blocks to 77 nodes requiring 22053376 bytes.
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 22052864 bytes
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 5264578 bytes
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Compiler backend is used during engine execution.
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 1.12775 seconds.
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 85 MiB, GPU 4169 MiB
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:38:59 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 6746 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 6 MiB
INFO: [Torch-TensorRT] - [MS] Running engine with multi stream info
INFO: [Torch-TensorRT] - [MS] Number of aux streams is 1
INFO: [Torch-TensorRT] - [MS] Number of total worker streams is 2
INFO: [Torch-TensorRT] - [MS] The main stream provided by execute/enqueue calls is the first worker stream
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +21, now: CPU 85, GPU 4177 (MiB)
11:38:59 torch_tensorrt [TensorRT Conversion Context] WARNING: WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
11:39:00 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageChange] Init builder kernel library: CPU +732, GPU +2, now: CPU 22801, GPU 5928 (MiB)
11:39:00 torch_tensorrt [TensorRT Conversion Context] INFO: Global timing cache in use. Profiling results in this builder pass will be stored.
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: [GraphReduction] The approximate region cut reduction algorithm is called.
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Detected 65 inputs and 1 output network tensors.
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Host Persistent Memory: 263984 bytes
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Device Persistent Memory: 0 bytes
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Max Scratch Memory: 0 bytes
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Started assigning block shifts. This will take 70 steps to complete.
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: [BlockAssignment] Algorithm ShiftNTopDown took 0.5761ms to assign 8 blocks to 70 nodes requiring 21495808 bytes.
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Activation Memory: 21495808 bytes
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Total Weights Memory: 4384288 bytes
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Engine generation completed in 6.54131 seconds.
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 85 MiB, GPU 4750 MiB
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 27 bytes of code generator cache.
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 4534990 bytes of compilation cache.
11:39:07 torch_tensorrt [TensorRT Conversion Context] INFO: Serialized 7408 timing cache entries
INFO: [Torch-TensorRT] - Loaded engine size: 5 MiB
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 85, GPU 4202 (MiB)
11:39:07 torch_tensorrt._compile WARNING: Provided model is a torch.fx.GraphModule and retrace is False, inputs or arg_inputs is not necessary during save.
11:39:21 py.warnings WARNING: torch_tensorrt\dynamo\_exporter.py:396: UserWarning: Attempted to insert a get_attr Node with no underlying reference in the owning GraphModule! Call GraphModule.add_submodule to add the necessary submodule, GraphModule.add_parameter to add the necessary Parameter, or nn.Module.register_buffer to add the necessary buffer
engine_node = gm.graph.get_attr(engine_name)
11:39:21 py.warnings WARNING: torch\export\exported_program.py:1681: UserWarning: Unable to execute the generated python source code from the graph. The graph module will no longer be directly callable, but you can still run the ExportedProgram, and if needed, you can run the graph module eagerly using torch.fx.Interpreter.
warnings.warn(
W0126 11:39:21.575000 15968 D:\Program Files\jasna\torch\export\pt2_archive\_package.py:586] Expect archive file to be a file ending in .pt2, or is a buffer. Instead got {model_weights\lada_mosaic_restoration_model_generic_v1.2_clip60.trt_fp16.win.engine}
Compiling model_weights\rfdetr-v2.onnx to model_weights\rfdetr-v2.bs4.fp16.win.engine
[01/26/2026-11:40:07] [TRT] [W] WARNING The logger passed into createInferBuilder differs from one already registered for an existing builder, runtime, or refitter. So the current new logger is ignored, and TensorRT will use the existing one which is returned by nvinfer1::getLogger() instead.
[01/26/2026-11:40:08] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +746, GPU +0, now: CPU 18995, GPU 6608 (MiB)
[01/26/2026-11:40:08] [TRT] [I] ----------------------------------------------------------------
[01/26/2026-11:40:08] [TRT] [I] ONNX IR version: 0.0.8
[01/26/2026-11:40:08] [TRT] [I] Opset version: 17
[01/26/2026-11:40:08] [TRT] [I] Producer name: pytorch
[01/26/2026-11:40:08] [TRT] [I] Producer version: 2.8.0
[01/26/2026-11:40:08] [TRT] [I] Domain:
[01/26/2026-11:40:08] [TRT] [I] Model version: 0
[01/26/2026-11:40:08] [TRT] [I] Doc string:
[01/26/2026-11:40:08] [TRT] [I] ----------------------------------------------------------------
[01/26/2026-11:40:08] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[01/26/2026-11:40:10] [TRT] [I] Compiler backend is used during engine build.
[01/26/2026-11:42:08] [TRT] [I] Detected 1 inputs and 3 output network tensors.
[01/26/2026-11:42:09] [TRT] [I] Total Host Persistent Memory: 87456 bytes
[01/26/2026-11:42:09] [TRT] [I] Total Device Persistent Memory: 0 bytes
[01/26/2026-11:42:09] [TRT] [I] Max Scratch Memory: 214361088 bytes
[01/26/2026-11:42:09] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 60 steps to complete.
[01/26/2026-11:42:09] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 1.4255ms to assign 8 blocks to 60 nodes requiring 461825024 bytes.
[01/26/2026-11:42:09] [TRT] [I] Total Activation Memory: 461825024 bytes
[01/26/2026-11:42:09] [TRT] [I] Total Weights Memory: 69964928 bytes
[01/26/2026-11:42:09] [TRT] [I] Compiler backend is used during engine execution.
[01/26/2026-11:42:09] [TRT] [I] Engine generation completed in 120.625 seconds.
[01/26/2026-11:42:09] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 171 MiB, GPU 8404 MiB