一、程序介绍

本程序是统计文本中单词出现的次数。

并在程序界面中显示出现次数最多的十个单词出现多少次,并用柱状图显示出来。横坐标为按出现次数的单词的排名。X=1,即代表出现次数最多的单词。纵坐标为出现的次数。如x=1,y=10代表出现次数最多的单词出现的次数为10次。X=2,y=9表示出现次数第二多的单词在文本中出现的次数为9次。

该程序还可以显示出现次数最多的5个单词。

该程序有两个文件:main.m和calledit.m。运行的时候运行main()即可。

输入文本后回车,即会显示柱状图和出现最多的5个单词。

程序截图如下:

二、特点

1.只适合英文,不适合中文

2.文本中至少有10个不同的单词,因为该单词绘图是显示的是出现次数前10的单词,所以文本中至少有10个不同的单词,否则会报错。

3.本程序没有对那些标点符号进行识别,所以如果文本中有标点符号会影响最后的结果。

三、训练集

1.i love matlab very much in every classdaily life i love matlab very much in every class daily i love matlab very muchin every class i love matlab very much in every i love matlab very much in ilove matlab very much i love matlab very i love matlab i love i

2. MATLAB (matrix laboratory) isa multi-paradigm numerical computing environment and fourth-generation programminglanguage. A proprietary programming language developedby MathWorks,MATLAB allows matrixmanipulations, plotting of functions and data, implementationof algorithms,creation of user interfaces, and interfacing with programswritten in other languages, including C, C++, Java, Fortran and Python.

四、主函数(main.m)

clf resetstr1='单词频率统计'set(gcf,'name',str1,'numbertitle','off')H=axes('unit','normalized','position',[0,0,1,1],'visible','off');set(gcf,'currentaxes',H)str='\fontname{楷书}单词频率统计'text(0.12,0.93,str,'fontsize',20)h_fig=get(H,'parent')set(h_fig,'unit','normalized','position',[0.1,0.2,0.7,0.4])global h_edit1 h_edit2 h_edit3 h_edit4 h_edit5h_axes=axes('parent',h_fig,'unit','normalized','position',[0.1,0.1,0.4,0.75],'xlim',[1 10],'ylim',[0 50],'fontsize',8)h_text=uicontrol(h_fig,'style','text','unit','normalized','position',[0.55,0.95,0.45,0.05],'horizontal','left','string',{'请输入文本:'});h_edit=uicontrol(gcf,'style','edit','unit','normalized','position',[0.55,0.80,0.45,0.12]);h_text1=uicontrol(h_fig,'style','text','unit','normalized','position',[0.55,0.60,0.16,0.05],'horizontal','left','string',{'出现次数第1的单词'});h_edit1=uicontrol(gcf,'style','edit','unit','normalized','position',[0.55,0.55,0.16,0.05]);h_text2=uicontrol(h_fig,'style','text','unit','normalized','position',[0.55,0.50,0.16,0.05],'horizontal','left','string',{'出现次数第2的单词'});h_edit2=uicontrol(gcf,'style','edit','unit','normalized','position',[0.55,0.45,0.16,0.05]);h_text3=uicontrol(h_fig,'style','text','unit','normalized','position',[0.55,0.40,0.16,0.05],'horizontal','left','string',{'出现次数第3的单词'});h_edit3=uicontrol(gcf,'style','edit','unit','normalized','position',[0.55,0.35,0.16,0.05]);h_text4=uicontrol(h_fig,'style','text','unit','normalized','position',[0.55,0.30,0.16,0.05],'horizontal','left','string',{'出现次数第4的单词'});h_edit4=uicontrol(gcf,'style','edit','unit','normalized','position',[0.55,0.25,0.16,0.05]);h_text5=uicontrol(h_fig,'style','text','unit','normalized','position',[0.55,0.20,0.16,0.05],'horizontal','left','string',{'出现次数第5的单词'});h_edit5=uicontrol(gcf,'style','edit','unit','normalized','position',[0.55,0.15,0.16,0.05]);set(h_edit,'callback','calledit(h_edit)')

五、子函数(calledit,m)function [e,f,g,q] = calledit( sentence1 )%UNTITLED7 Summary of this function goes here%   Detailed explanation goes hereglobal h_edit1 h_edit2 h_edit3 h_edit4 h_edit5sentence=get(sentence1,'string')n=1000j=1a=cell(1,n)b=[]r=sentence;[w,r]=strtok(r);%将文本的单词分隔出来,一个一个存放在w.a(j)={w}b(1)=1while(any(r))    [w,r]=strtok(r);    d=0;    for i=1:j        k1=strcmpi(a(i),w)        if k1==1            b(i)=b(i)+1;        else            d=d+1;%计算单词次数        end    end    if d==j        j=j+1        a(j)={w}        b(j)=1;    endend[b1,index]=sort(b)%将单词次数排列e=af=b1g=jq=indext=[1:10]y=f(j-t+1)bar(t,y)set(h_edit1,'string',a(q(j)))set(h_edit2,'string',a(q(j-1)))set(h_edit3,'string',a(q(j-2)))set(h_edit4,'string',a(q(j-3)))set(h_edit5,'string',a(q(j-4)))