4

I'm not talking about -o option. Posix says:

The search for a matching sequence starts at the beginning of a string and stops when the first sequence matching the expression is found, where "first" is defined to mean "begins earliest in the string". If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched. For example, the BRE "bb*" matches the second to fourth characters of the string "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten characters of the string "weeknights".

And I want to verify what is being said in posix and this tutorial regTutorialSite:

A POSIX-compliant engine will still find the leftmost match. If you apply Set|SetValue to Set or SetValue once, it will match Set.

How to "apply once"? When i run grep -o the result is two strings: Set and SetValue, but not just " one leftmost " . That is, I read about one thing, but in practice I get something else. So, how to see what string was matched by regex?

(Perhaps the question was formulated incorrectly or could have been better)

3
  • 2
    Is this just for self-education about grep or do you want to actually use the extracted matches in a script? if the former, then self-education is great. If the latter, then you'll be much better off using a language like awk or perl which are designed for exactly that kind of task, trying to do it in shell with grep and command substitution will be slow and awkward. e.g. perl can return ALL matches in an array, making it easy to iterate over the results rather than repeating the search multiple times.
    – cas
    Commented 18 hours ago
  • So, why does it matter if regular grep finds the first possible match on the line, or one of the possible other matches? The default operation is defined to just print the line, regardless of what exactly matched. On the ither hand, is there reason to doubt the internal matching logic of grep wouldn't find the first match, as usual? That would mean grep would need to have a different regex engine from everything else, and for no real use, since it prints just the whole line anyway.
    – ilkkachu
    Commented 14 hours ago
  • As for grep -o, I think it's defined to print all matches, so again, there's no conflict between what you saw and what is documented. Grep isn't the same thing as the regex engine in the C library, and it's not reasonable to expect a description of the library functions would describe accurately the behavior of grep.
    – ilkkachu
    Commented 14 hours ago

1 Answer 1

10

grep is named after the g/re/p command of the ed editor. It's about printing the lines that match the given regular expression.

What portion of the line matches is not relevant then.

The GNU implementation has added these two extensions over the standard:

  • -o that prints all the non-empty matches of the regexp
  • --color that highlights all the matches

But both match the regexp differently from how grep without them does as they carry on looking for more matches after the first in a manner similar to that of ed's s/pattern/<&>/g command (g being the key here).

No grep implementation that I know has a way to output the one and only match that grep without -o/--color matches on.

You'd need to use other tools such as sed, awk or perl.

For instance, to see what grep regex matches on, you could do:

sed -n 's/regex/<&>/p'

Which would print the matching lines with <...> around the matched portion. To print only the matched portion:

sed -n '
  /regex/ {
    s//\
&\
/
    s/^.*\n\(.*\)\n.*$/\1/p
  }'

For grep -E regex:

awk 'match($0, "regex") {print substr($0, RSTART, RLENGTH)}'

(awk regexps are similar to those supported by grep with -E; or used the same approach as above with sed -E where supported)

For grep -P regex:

perl -lne 'print $& if /regex/'

(grep -P initially from GNU grep is for perl regular expressions, but via PCRE2 (formerly PCRE) which are not fully equivalent to perl ones, ast-open grep has its own variant of perl-like regular expressions).

With the perl regexps (grep implementations that support both -P and -o such as GNU grep when built with optional PCRE2 support), you can also do:

grep -Po '^.*?\K(?:regex)'

The ^.*? matches as few characters as possible starting with the start of the line which prevents the regex from matching more than once. \K marks the start of what's to be Kept (and output with -o) from that. The (?:...) grouping is in case there's an alternation operator in the regex (as in a|b), avoiding the capturing variant ((...)) in case the regex has some \1 or (?1)... operators.

Beware it doesn't report empty matches.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.