corpus for computational linguisitcs
Submitted by Sunday, 27 September, 2009 - 20:49
on
Hi,
I recently discovered Jedit and started working with Jedit macros.
This site resources have been very helpful in assisting me with writing
a search and replace macro for HTML documents for corpus purposes.
I would like to clean HTML files and save them into txt files.
I am familiar with classes and methods from C++ but not quite
with Beanshell coding. Are there any references for Beanshell codes and methods ?
I would appreciate any help, in form of macro code for the following:
1- macro to delete first x lines of a document
2- macro for keeping only contents of title, paragraph or body of an HTML document
Any help in this matter through code or references would be greatly appreciated. Thanks.
I recently discovered Jedit and started working with Jedit macros.
This site resources have been very helpful in assisting me with writing
a search and replace macro for HTML documents for corpus purposes.
I would like to clean HTML files and save them into txt files.
I am familiar with classes and methods from C++ but not quite
with Beanshell coding. Are there any references for Beanshell codes and methods ?
I would appreciate any help, in form of macro code for the following:
1- macro to delete first x lines of a document
2- macro for keeping only contents of title, paragraph or body of an HTML document
Any help in this matter through code or references would be greatly appreciated. Thanks.