Main Content

Magnolia Community Forums: Get help with Magnolia: Magnolia 5.3.12 search result problems


  • armasescuroxana
    armasescuroxana
    Full name: Armasescu Roxana
    Posts: 2
    Last post: Nov 15, 2016 12:39:11 PM
    Registered on: Nov 10, 2016
    Magnolia 5.3.12 search result problems
    #1 by armasescuroxana on Nov 11, 2016 11:22:40 AM

    Hello,

    I have a problem with the way magnolia(5.3.12) is indexing page content. It is indexing not just page content but css classes content too. Which brings me to results that are not accurate

    the search index config:
    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
    <param name="path" value="${wsp.home}/index" />
    <param name="useCompoundFile" value="true" />
    <param name="minMergeDocs" value="100" />
    <param name="volatileIdleTime" value="3" />
    <param name="maxMergeDocs" value="100000" />
    <param name="mergeFactor" value="10" />
    <param name="maxFieldLength" value="10000" />
    <param name="bufferSize" value="10" />
    <param name="cacheSize" value="1000" />
    <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl" />
    <param name="respectDocumentOrder" value="true" />
    <param name="resultFetchSize" value="2147483647" />
    <param name="extractorPoolSize" value="3" />
    <param name="extractorTimeout" value="100" />
    <param name="extractorBackLogSize" value="100" />
    <param name="enableConsistencyCheck" value="false" />
    <param name="forceConsistencyCheck" value="false" />
    <param name="autoRepair" value="false" />
    <param name="onWorkspaceInconsistency" value="log" />
    </SearchIndex>

    the sql: QUERY_PATTERN = "select * from nt:base where jcr:path like ''{0}/%'' and contains(*, ''{1}'') order by jcr:path";

    Is there something I am missing so that I can search only in actual content and not in all the HTML page?

    Thanks,
    Roxana

  • fgrilli
    fgrilli
    Full name: federico grilli
    Posts: 207
    Last post: Jan 2, 2017 11:57:16 AM
    Registered on: Sep 15, 2010
    Re: Magnolia 5.3.12 search result problems
    #2 by fgrilli on Nov 11, 2016 12:28:07 PM

    Hi Roxana,

    Jackrabbit full text indexing should only index text fields, not the rendered html page. I believe those tags likely come from some rich text component which stores everything in JCR without stripping extraneous tags. One way to deal with this may be what was introduced with Magnolia 5.4, that is a custom excerpt provider class https://git.magnolia-cms.com/projects/PLATFORM/repos/main/browse/magnolia-core/src/main/java/info/magnolia/jackrabbit/lucene/SearchHTMLExcerpt.java?at=refs%2Fheads%2Fmagnolia-5.4.x

    See also https://documentation.magnolia-cms.com/display/DOCS/Search

    Hope this helps,

    Federico

You don't have the permission to post on this thread

Sign in

To login on this forum, you can use your Magnolia Forge, Support or Partner account, or, below, your Google, Yahoo! or OpenID account. If you have trouble logging in, or any other sort of issue, please let us know in the Meta forum, on the user-list, or simply by email at forum-admin at magnolia-cms dot com.

* Required

... or sign in with:

  • icon http://{your-openid-url}
  • icon
  • icon https://me.yahoo.com/