Summaryondeeplearningframework---PyTorch
Updatedon2018-07-2221:25:42
importosos.environ["CUDA_VISIBLE_DEVICES"]="4"
exportCUDA_VISIBLE_DEVICES=0
pipinstallthopfromtorchvision.modelsimportresnet50fromthopimportprofilemodel=resnet50()input=torch.randn(1,3,224,224)macs,params=profile(model,inputs=(input,))
installaspecificversionofsoftwareintopython3.X:
python3.6-mpipinstallscipy##willinstallscipyintopython3.6
python3.8-mpipinstallscipy##willinstallscipyintopython3.8
python2.7-mpipinstallscipy##willinstallscipyintopython2.7
python2.7-mpip--default-timeout=6000installtorch==0.4.1##willwaituntil6000sfortheinstallofscipy.
tar-zcvfflilename.tar.gz*
RuntimeError:one_hotisonlyapplicabletoindextensor.Solution:changetheyourinputdatainto.long()style:one_hot=torch.nn.functional.one_hot(YourInput.long(),n)
第一步,安装dkms:sudoapt-getinstalldkms
第二步,查看本机连接不上的驱动版本:ls-l/usr/src/
第三步,使用dkms重新安装适合驱动:sudodkmsinstall-mnvidia-v510.73.05
将程序推送到后台执行:nohuppythonmain.py&
officeed2k://|file|cn_office_professional_plus_2019_x86_x64_dvd_5e5be643.iso|3775004672|1E4FFA5240F21F60DC027F73F1C62FF4|/visioed2k://|file|cn_visio_professional_2019_x86_x64_dvd_97bda48c.iso|3775004672|26D248309B18EDBEEBE8DC43C55995DB|/
1.添加PPA存储库以获取程序及依赖项,打开一个终端,运行下面的命令sudoadd-apt-repositoryppa:ubuntu-toolchain-r/testsudoadd-apt-repositoryppa:lkoppel/opencvsudoadd-apt-repositoryppa:janisozaur/cmake-updatesudoadd-apt-repositoryppa:inivation-ppa/inivation-xenial2.更新软件包索引sudoapt-getupdate3.安装dv-guisudoapt-getinstalldv-gui
pytrackingissue:
RuntimeError:Errorbuildingextension'_prroi_pooling'Answer:exportCUDA_HOME='/usr/local/cuda'
Question:StopIteration:CaughtStopIterationinreplica0ondevice0.
A1:changethepytorch==1.5.0->pytorch==1.4.0
A2:findcorrespondingline,andchangenext(self.parameters()).dtype换成torch.float32
0.Linux统计文件夹下的文件数目
统计当前目录下文件的个数(不包括目录):ls-l|grep"^-"|wc-l
统计当前目录下文件的个数(包括子目录):ls-lR|grep"^-"|wc-l
查看某目录下文件夹(目录)的个数(包括子目录):ls-lR|grep"^d"|wc-l
命令解析:
1.installthepytorchversion0.1.11
2.whathappenedwhenfollowingerrorsoccurs
3.GPU和CPU数据之间的转换:
(1)CPU--->GPU:a.cuda()
(2)GPU--->CPU:a.cpu()
(3)torch.tensor--->numpyarray:
a_numpy_style=a.numpy()
(4)numpyarray--->torch.tensor:
1>>>importnumpyasnp2>>>a=np.ones(5)3>>>b=torch.from_numpy(a)4>>>np.add(a,1,out=a)5array([2.,2.,2.,2.,2.])6>>>print(a)7[2.2.2.2.2.]8>>>print(b)910211212213214215[torch.DoubleTensorofsize5]1617>>>c=b.numpy()18>>>c19array([2.,2.,2.,2.,2.])
4.VariableandTensor:
==>>programsoccurederror:
expectedaVariable,butgotaFloat.Tensor(),~~~~
==>>thiscanbesolvedbyadding:
fromtorch.autogradimportVariablehard_neg_differ_=Variable(hard_neg_differ_)==>>thiswillchangethehard_neg_differ_intoavariable,notaFloat.Tensor()anymore.
ittellus:
>>>importtorch>>>x=torch.Tensor(2,3,4)>>>x(0,.,.)=1.00000e-37*2.41680.00000.00000.00000.00000.00000.00000.00000.00000.00000.00000.0000(1,.,.)=1.00000e-37*0.00000.00000.00000.00000.00000.00000.00000.00000.00000.00000.00000.0000[torch.FloatTensorofsize2x3x4]>>>fromtorch.autogradimportVariable>>>x=Variable(x)>>>xVariablecontaining:(0,.,.)=1.00000e-37*2.41680.00000.00000.00000.00000.00000.00000.00000.00000.00000.00000.0000(1,.,.)=1.00000e-37*0.00000.00000.00000.00000.00000.00000.00000.00000.00000.00000.00000.0000[torch.FloatTensorofsize2x3x4]ViewCode
But,youcannotdirectlyconverttheVariabletonumpy()orsomethingelse.YoucanloadthevaluesintheVariableandconverttonumpy()through:
value=varable.data.numpy().
7.pytorchsavecheckpoints
torch.save(model.state_dict(),filename)
8.installpython3.5onubuntusystem:
sudoadd-apt-repositoryppa:fkrull/deadsnakessudoapt-getupdatesudoapt-getinstallpython3.5whentesting,justtype:python3.5
9.loadimgetotensor&savetensordatatoimagefiles.
deftensor_load_rgbimage(filename,size=None,scale=None):img=Image.open(filename)ifsizeisnotNone:img=img.resize((size,size),Image.ANTIALIAS)elifscaleisnotNone:img=img.resize((int(img.size[0]/scale),int(img.size[1]/scale)),Image.ANTIALIAS)img=np.array(img).transpose(2,0,1)img=torch.from_numpy(img).float()returnimgdeftensor_save_rgbimage(tensor,filename,cuda=False):ifcuda:img=tensor.clone().cpu().clamp(0,255).numpy()else:img=tensor.clone().clamp(0,255).numpy()img=img.transpose(1,2,0).astype('uint8')img=Image.fromarray(img)img.save(filename)
10.theoftenusedopeartionsinpytorch:
11.error:RuntimeError:tensorsareondifferentGPUs
==>>thisiscausedyousetdataintoGPUmode,butnotpre-definedmodel.
12.torch.mmandtorch.spmm
torch.mm(mat1,mat2)--->输入的两个矩阵相乘;
torch.spmm()--->
13.Expectedobjectoftypetorch.cuda.LongTensorbutfoundtypetorch.cuda.DoubleTensorforargument#2'target'
File"/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py",line1332,innll_lossreturntorch._C._nn.nll_loss(input,target,weight,size_average,ignore_index,reduce)RuntimeError:Expectedobjectoftypetorch.cuda.LongTensorbutfoundtypetorch.cuda.DoubleTensorforargument#2'target'
14.RuntimeError:multi-targetnotsupportedat/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:16
File"run_train.py",line150,intrain_gcnTrackerloss_train=F.nll_loss(output.float(),labels.long())File"/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py",line1332,innll_lossreturntorch._C._nn.nll_loss(input,target,weight,size_average,ignore_index,reduce)RuntimeError:multi-targetnotsupportedat/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:16
15.SetGPUID:exportCUDA_VISIBLE_DEVICES=0
16.fig.savefig(os.path.join(savefig_dir,'0000.jpg'),dpi=dpi)
17.whenIusetorch.cat()toconcatenatetwotensors,itshownmeerrorslikefollows:
***RuntimeError:ExpectedaTensoroftypetorch.DoubleTensorbutfoundatypetorch.FloatTensorforsequenceelement1insequenceargumentatposition#1'tensors'
18.RuntimeError:element0oftensorsdoesnotrequiregradanddoesnothaveagrad_fn
====StartCycle0====
('==>>loss_attention:',tensor(0.6981))Traceback(mostrecentcalllast):File"step_1_train_attention.py",line187,in
==>>trythis:loss_attention=Variable(loss_attention,requires_grad=True)
19.printthelossvariationalongwithtraining:
20.Modelinitializationforfclayers
1##Initializationforfclayers23fromtorch.nnimportinit45self.fc1=nn.Linear(1024,1024)6init.xavier_normal(self.fc1.weight)
21.PyTorchimplementationforconvolutionalfeaturevisualization:
22.ValueError:invalidliteralforint()withbase10:'135.5'
(Pdb)int(x)***ValueError:invalidliteralforint()withbase10:'135.5'(Pdb)round(float(x))136.0(Pdb)
==>>Solution:
int(round(float(initial_BBox[2])))
23.Loadingpre-trainedVGG-19Model:
model_root='./vgg16-397923af.pth'defmake_layers(cfg,batch_norm=False):layers=[]in_channels=3forvincfg:ifv=='M':layers+=[nn.MaxPool2d(kernel_size=2,stride=2)]else:conv2d=nn.Conv2d(in_channels,v,kernel_size=3,padding=1)ifbatch_norm:layers+=[conv2d,nn.BatchNorm2d(v),nn.ReLU(inplace=True)]else:layers+=[conv2d,nn.ReLU(inplace=True)]in_channels=vreturnnn.Sequential(*layers)cfg={'A':[64,'M',128,'M',256,256,'M',512,512,'M',512,512,'M'],'B':[64,64,'M',128,128,'M',256,256,'M',512,512,'M',512,512,'M'],'D':[64,64,'M',128,128,'M',256,256,256,'M',512,512,512,'M',512,512,512,'M'],'E':[64,64,'M',128,128,'M',256,256,256,256,'M',512,512,512,512,'M',512,512,512,512,'M'],}"""VGG19-layermodel"""model=VGG(make_layers(cfg['D']))model.load_state_dict(torch.load(model_root))VGG_net=VGG(model)VGG_net=VGG_net.cuda()
24.***RuntimeError:CUDNN_STATUS_BAD_PARAM
==>>duetodifferentinputandgivenfeaturedimension.
25.***RuntimeError:Tryingtobackwardthroughthegraphasecondtime,butthebuffershavealreadybeenfreed.Specifyretain_graph=Truewhencallingbackwardthefirsttime.
Toreducememoryusage,duringthe.backward()call,alltheintermediaryresultsaredeletedwhentheyarenotneededanymore.
Henceifyoutrytocall.backward()again,theintermediaryresultsdon’texistandthebackwardpasscannotbeperformed(andyougettheerroryousee).
Youcancall.backward(retain_graph=True)tomakeabackwardpassthatwillnotdeleteintermediaryresults,andsoyouwillbeabletocall.backward()again.
Allbutthelastcalltobackwardshouldhavetheretain_graph=Trueoption.
26.RuntimeError:functionConcatBackwardreturnedagradientdifferentthanNoneatposition3,butthecorrespondingforwardinputwasnotaVariable
g_loss.backward(retain_graph=True)File"/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py",line156,inbackwardtorch.autograd.backward(self,gradient,retain_graph,create_graph,retain_variables)File"/usr/local/lib/python2.7/dist-packages/torch/autograd/__init__.py",line98,inbackwardvariables,grad_variables,retain_graph)RuntimeError:functionConcatBackwardreturnedagradientdifferentthanNoneatposition3,butthecorrespondingforwardinputwasnotaVariable
==>>Similaroperationslike:output=torch.cat(Variable(x),y),willcausethisproblem.YouneedtocheckthevariablesyoufeedtotheneuralnetworkandmakesuretheyareallVariable.
27.Shownmethefollowingerrorwhenusenn.BCELoss():
"Get'sfixedapplyingnn.BCEWithLogitsLoss()insteadofnn.BCELoss()innetworks.pyline82--itrestrictslossvaluesbetween0and1beforeapplyingtheloss."
28.Shitissuesaboutnn.GRUtoencodethenaturallanguage:RuntimeError:CuDNNerror:CUDNN_STATUS_SUCCESS
Traceback(mostrecentcalllast):File"train_mim_langTracking.py",line373,in
29.Deepcopywith"clone"operation
A29:vis_feat=x.data.clone()
30.HowtoTraintheDeepNetworkwithMulti-GPUinonemachine
align_h=model.roi_align_model.aligned_height
这个时候,必须改为:align_h=model.module.roi_align_model.aligned_height,区别就是:中间加一个module作为过度才可以。
另外一个bug是:原本可以正常执行的代码,加了并行化的模块后,不行了。比如:
====StartCycle0====Traceback(mostrecentcalllast):File"train_lang_coAttention_MultiGPU_version.py",line311,in
这里,提示我仅仅给了2个参数。但是不加这个模块,是可以正常运行的。这是不是说明了,某些bug的存在导致了该错误?那么,是什么bug呢???
31.(pysot)wangxiao@wx:~/Downloads/pysot/experiments/siamrpn_mobilev2_l234_dwxcorr$CUDA_LAUNCH_BLOCKING=1python-u../../tools/test.py--snapshotmodel.pth--datasetVOT2018--configconfig.yamlloadingVOT2018:100%|██████████████████████████████████|60/60[00:00<00:00,66.26it/s,zebrafish1]THCudaCheckFAILfile=/opt/conda/conda-bld/pytorch_1535493744281/work/aten/src/THC/THCGeneral.cppline=663error=11:invalidargumentcudaCheckError()failed:anillegalmemoryaccesswasencountered
A31.anybodyknowwhathappened
32.这个函数不能随便用:torch.nn.utils.clip_grad_norm_(model.parameters(),0.25)
本来原本的代码中不带这句话,收敛的很正常。但是后来因为某些原因,我加上了这个。结果loss一直降不下来,维持在200左右,坑得很啊,然后我将其注释掉之后,重新跑,loss分分钟下来了。
Q33.Modeltransformfromcaffetopytorch:
Q34.SRUissue:ModuleNotFoundError:NoModulenamed'cuda_functional':
A34.pipinstallsru[cuda]willsolvethisproblem.
Q35.Saveonlyorloadonlypartofpre-trainedpyTorchmodel:
Q36.在使用PyTorch训练的过程中,显存占用越来越大,最终导致out-of-memory?
A36.导致这种情况的一个可能的原因是(我自己遇到的):在计算totalloss的时候,不能直接相加。要用.data[0]将数据取出来,才可以。不然,pyTorch会自动将该部分加入计算图,导致显存占用越来越多,最终爆掉了。
Q37.***RuntimeError:_sigmoid_forward_outisnotimplementedfortypetorch.cuda.LongTensor
A37.So,howtotransformthetorch.cuda.LongTensorstyleintothetorch.cuda.FloatTensorTrythis:
maxIoU=maxIoU.type(torch.cuda.FloatTensor)
Q38.File"training_demo.py",line236,in
A38.First,youneedtodo:fromtorch.autogradimportVariable,then,addthislinebeforethebackwardfunction:
L2_loss=Variable(L2_loss,requires_grad=True)
Q39.ImportError:cannotimportname'imresize'from'scipy.misc'(/home/wangxiao/anaconda3/envs/goturn2.0/lib/python3.7/site-packages/scipy/misc/__init__.py)
pipinstallPillow
pipinstallscipy==1.1.0
Q40.ImportError:libcudart.so.10.1:cannotopensharedobjectfile:Nosuchfileordirectory
condainstallcudatoolkitcondainstallcudnnQ41.ImportError:./modules/roi_align/roi_align_cuda.cpython-37m-x86_64-linux-gnu.so:undefinedsymbol:_ZN2at19UndefinedTensorImpl10_singletonE
A41.First,youneedtoremovethefolder"build",generatedfrompreviousversion.Thenmakethisfileagain:
rm-rfbuild/
pythonsetup.pybuild_ext--inplace
Shit,aforementionedsolutionscannothandlethefollowingissues:
ImportError:./modules/roi_align/roi_align_cuda.cpython-37m-x86_64-linux-gnu.so:undefinedsymbol:_ZN3c105ErrorC1ENS_14SourceLocationERKSs.
Trythis:condainstallpytorchtorchvisioncudatoolkit=10.0-cpytorch
Q42.Segmentationfault(coredumped)
A42.ThismaybecausedbyyouforgettotransformthedataintoGPU.Findthelinethebugoccurred,anddothis:feature=feature.cuda()
Q43.SetedgecolorwithPILforvisualization(RGB(0,153,255),#0099FF颜色查询):
Q44.
Q45.AssertionError:
SoTrythis:
condainstallpytorch==1.2.0torchvision==0.4.0cudatoolkit=10.0-cpytorch
Q46.ERROR:CouldnotinstallpackagesduetoanEnvironmentError:[Errno28]Nospaceleftondevice
Q47.
fromtorchvisionimport_C
ImportError:libcudart.so.9.0:cannotopensharedobjectfile:Nosuchfileordirectory
A47.pipinstalltorchvision==0.2.2willbeok.
Q48.ImportError:libmkl_rt.so:cannotopensharedobjectfile:Nosuchfileordirectory
A48.First,youneedtolocatethelibfilesby:
#locatelibmkl_rt.so
/home/wangxiao/anaconda3/lib/libmkl_rt.so/home/wangxiao/anaconda3/pkgs/mkl-2018.0.2-1/lib/libmkl_rt.so/home/wangxiao/miniconda3/envs/GlobalTrack/lib/libmkl_rt.so/home/wangxiao/miniconda3/envs/TrackingNet/lib/libmkl_rt.so/home/wangxiao/miniconda3/envs/aaai/lib/libmkl_rt.so/home/wangxiao/miniconda3/envs/beamtracker/lib/libmkl_rt.so/home/wangxiao/miniconda3/envs/goturn2.0/lib/libmkl_rt.so/home/wangxiao/miniconda3/envs/pysot/lib/libmkl_rt.so/home/wangxiao/miniconda3/envs/rtmdnet/lib/libmkl_rt.so/home/wangxiao/miniconda3/envs/sint3.0/lib/libmkl_rt.so/home/wangxiao/miniconda3/lib/libmkl_rt.so/home/wangxiao/miniconda3/pkgs/mkl-2019.4-243/lib/libmkl_rt.so
Then,youcanexecutefollowing:
exportLD_LIBRARY_PATH=/home/wangxiao/anaconda3/pkgs/mkl-2018.0.2-1/lib/:$LD_LIBRARY_PATH
Then,thisproblemcanbesolved.
Q49.ModuleNotFoundError:Nomodulenamed'six'
A49.Thiserrorcan'tbesolvedbysimply:pipinstallsix.Itriedthis:condainstall-cintelmkl-service
Q50.使用condainstall安装软件遇到如下错误:Segmentationfault(coredumped)
通过检索,发现可能是之前安装的时候网络不良导致中途出错,于是有些包虽然在本地缓存了,但其实不完整。
Q51.生成一组随机数:
随机值
返回一个样本,具有标准正态分布。
Notes
sigma*np.random.randn(...)+muExamples
>>>np.random.randn()2.1923875335537315#randomTwo-by-fourarrayofsamplesfromN(3,6.25):
返回随机的整数,位于半开区间[low,high)。
>>>np.random.randint(2,size=10)array([1,0,0,0,1,1,0,0,1,0])>>>np.random.randint(1,size=10)array([0,0,0,0,0,0,0,0,0,0])Generatea2x4arrayofintsbetween0and4,inclusive:
返回随机的整数,位于闭区间[low,high]。
TosamplefromNevenlyspacedfloating-pointnumbersbetweenaandb,use:
a+(b-a)*(np.random.random_integers(N)-1)/(N-1.)Examples
>>>np.random.random_integers(5)4>>>type(np.random.random_integers(5))
>>>2.5*(np.random.random_integers(5,size=(5,))-1)/4.array([0.625,1.25,0.625,0.625,2.5])Rolltwosixsideddice1000timesandsumtheresults:
>>>d1=np.random.random_integers(1,6,1000)>>>d2=np.random.random_integers(1,6,1000)>>>dsums=d1+d2Displayresultsasahistogram:
>>>importmatplotlib.pyplotasplt>>>count,bins,ignored=plt.hist(dsums,11,normed=True)>>>plt.show()
返回随机的浮点数,在半开区间[0.0,1.0)。
(b-a)*random_sample()+aExamples
>>>np.random.random_sample()0.47108547995356098>>>type(np.random.random_sample())
>>>5*np.random.random_sample((3,2))-5array([[-3.99149989,-0.52338984],[-2.99091858,-0.79479508],[-1.23204345,-1.75224494]])
(官网例子与random_sample完全一样)
生成一个随机样本,从一个给定的一维数组
Examples
Generateauniformrandomsamplefromnp.arange(5)ofsize3:
>>>np.random.choice(5,3)array([0,3,4])>>>#Thisisequivalenttonp.random.randint(0,5,3)Generateanon-uniformrandomsamplefromnp.arange(5)ofsize3:
>>>np.random.choice(5,3,p=[0.1,0,0.3,0.6,0])array([3,3,0])Generateauniformrandomsamplefromnp.arange(5)ofsize3withoutreplacement:
>>>np.random.choice(5,3,replace=False)array([3,1,0])>>>#Thisisequivalenttonp.random.permutation(np.arange(5))[:3]Generateanon-uniformrandomsamplefromnp.arange(5)ofsize3withoutreplacement:
>>>np.random.choice(5,3,replace=False,p=[0.1,0,0.3,0.6,0])array([2,3,0])Anyoftheabovecanberepeatedwithanarbitraryarray-likeinsteadofjustintegers.Forinstance:
>>>aa_milne_arr=[‘pooh‘,‘rabbit‘,‘piglet‘,‘Christopher‘]>>>np.random.choice(aa_milne_arr,5,p=[0.5,0.1,0.1,0.3])array([‘pooh‘,‘pooh‘,‘pooh‘,‘Christopher‘,‘piglet‘],dtype=‘|S11‘)
返回随机字节。
>>>np.random.bytes(10)‘eh\x85\x022SZ\xbf\xa4‘#random
现场修改序列,改变自身内容。(类似洗牌,打乱顺序)
>>>arr=np.arange(10)>>>np.random.shuffle(arr)>>>arr[1752943608]
Thisfunctiononlyshufflesthearrayalongthefirstindexofamulti-dimensionalarray:
>>>arr=np.arange(9).reshape((3,3))>>>np.random.shuffle(arr)>>>arrarray([[3,4,5],[6,7,8],[0,1,2]])
返回一个随机排列
>>>np.random.permutation(10)array([1,7,4,3,0,9,2,5,8,6])>>>np.random.permutation([1,4,9,12,15])array([15,1,9,4,12])>>>arr=np.arange(9).reshape((3,3))>>>np.random.permutation(arr)array([[6,7,8],[0,1,2],[3,4,5]])
多元正态分布。
>>>mean=[0,0]>>>cov=[[1,0],[0,100]]#diagonalcovariance,pointslieonxory-axis>>>importmatplotlib.pyplotasplt>>>x,y=np.random.multivariate_normal(mean,cov,5000).T>>>plt.plot(x,y,‘x‘);plt.axis(‘equal‘);plt.show()
正态(高斯)分布
TheprobabilitydensityfortheGaussiandistributionis
Drawsamplesfromthedistribution:
>>>mu,sigma=0,0.1#meanandstandarddeviation>>>s=np.random.normal(mu,sigma,1000)Verifythemeanandthevariance:
>>>abs(mu-np.mean(s))<0.01True>>>abs(sigma-np.std(s,ddof=1))<0.01TrueDisplaythehistogramofthesamples,alongwiththeprobabilitydensityfunction:
>>>importmatplotlib.pyplotasplt>>>count,bins,ignored=plt.hist(s,30,normed=True)>>>plt.plot(bins,1/(sigma*np.sqrt(2*np.pi))*...np.exp(-(bins-mu)**2/(2*sigma**2)),...linewidth=2,color=‘r‘)>>>plt.show()
==
/disc2/naipeng.ye/langTrack_wangxiao/py-GCN-Triplet-Language-master/tracking/globalAttention_imgLanguage/generator_output