stingy regular expression matching

Submitted by kellner on Monday, 3 April, 2006 - 20:18

Hi,

I'm trying to clean up a document that was automatically converted to LaTeX. The document contains strings like these:

"(\textit{sahaprat}\textit{ī}\textit{tiniyama}). [Reply:] Not even for the piece of ground and the pot is it like this! However, the fact that, if both exist, there is no restriction of the cognition to [only] one image (\textit{ekar}\textit{ū}\textit{papratī}\textit{tiniyamaviraha}),"

(never mind the meaning Smiling

)

The idea is to get rid of all the unnecessary instances of \textit{}. So I thought to search for instances of \textit{} that immediately follow one another.

I entered this in the (regexp) search box: \\textit\{(.*?)\}\\textit\{(.*?)\}
and tried replacing it with: \\textit\{$1$2\}

The problem is that the stingy operator ("?") is not stingy enough: apparently the search function looks forward for the second instance of italicization *somewhere* in the document; it doesn't look for instances of *contiguous* italics.

Thus, the regexp also finds:

\textit{some italicized stuff} Oh and here is lots of writing in between! This should not be part of what is found, but it is! \textit{next italicized stuff}.

Somehow I can't bring myself to believe that this is how the regexp should be understood. Or am I barking up the wrong tree?

I also tried a positive lookahead \\textit\{(.*?)\}(?=\\textit\{(.*?)\}), but this produced the same unwanted results.

Any advice would be greatly appreciated,

Thanks in advance,

best regards,

« May 2025
Mo	Tu	We	Th	Fr	Sa	Su
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

file	ver	dls
GdbPlugin for jEdit 4.5+	0.5	1163
Hypersearch results analysis	1.0	2248
German Language Pack for jEdit 5 (up-to-date)	5.3	4157
Goal column macros	1.0	4047
Hyper-search all .txt files in home dir	1	3303
Select line	1.0	3460
Open_Copied_Path.bsh	1.0	8518
Select_All_or_Lines.bsh	1.0	3428
A BeanShell macro script to search and open a recent file or a file in the current directory.	1.0	5653
Select contents in between parentheses (excluding parentheses)	1.0	3558

file	ver	dls
German Localization light	4.4.2.1	108254
Context Free Art (*.cfdg)	0.31	46074
BBEdit scheme	1.0	18610
JBuilder scheme	.001	18511
ColdFusion scheme	1.0	18044
R Edit Mode - extensive version	0.1	17491
Advanced HTML edit mode	1.0	16226
Matlab Edit Mode	1.0	16088
jEdit XP icons	1.0	15248
XP icons for jEdit	1.1	14312

RSS

XML

HTML