Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some entities imported via search:import are not indexed (missing records) #372

Open
quentint opened this issue Mar 14, 2023 · 2 comments
Open

Comments

@quentint
Copy link

  • Symfony version: v6.2.7
  • Algolia Search Bundle version: 6.0.0
  • Algolia Client Version: N/A
  • Language Version: PHP 8.1.14 (cli)

Description

When importing entities with search:import, the logs display correct index counts, but when browsing the index, some are missing.

Here is the command output:

> bin/console search:import
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 160 / 160 App\Entity\MediaTranslation entities into quentin_media index
Done!

I'd then expect my index to contain 14 * 500 + 160 = 7160 items, but only 5216 exist:

image

But clearing the index and importing again yields another record count (+/-5%).

Here's my configuration:

algolia_search:
    prefix: '%algolia_search_prefix%'
    indices:
        - name: media
          class: App\Entity\MediaTranslation
And here's the index settings file (created with `search:settings:backup`)
{
    "minWordSizefor1Typo": 4,
    "minWordSizefor2Typos": 8,
    "hitsPerPage": 20,
    "maxValuesPerFacet": 100,
    "version": 2,
    "searchableAttributes": [
        "unordered(media.id)",
        "unordered(title)",
        "unordered(tags)",
        "unordered(description)",
        "unordered(features)",
        "unordered(goals)",
        "unordered(more)"
    ],
    "numericAttributesToIndex": null,
    "attributesToRetrieve": null,
    "unretrievableAttributes": null,
    "optionalWords": null,
    "attributesForFaceting": [
        "locale",
        "media.type",
        "status",
        "filterOnly(tags)",
        "filterOnly(title)"
    ],
    "attributesToSnippet": null,
    "attributesToHighlight": null,
    "paginationLimitedTo": 1000,
    "attributeForDistinct": null,
    "exactOnSingleWordQuery": "attribute",
    "ranking": [
        "typo",
        "geo",
        "words",
        "filters",
        "proximity",
        "attribute",
        "exact",
        "custom"
    ],
    "customRanking": null,
    "separatorsToIndex": "",
    "removeWordsIfNoResults": "none",
    "queryType": "prefixLast",
    "highlightPreTag": "<em>",
    "highlightPostTag": "<\/em>",
    "snippetEllipsisText": "",
    "alternativesAsExact": [
        "ignorePlurals",
        "singleWordSynonym"
    ],
    "sortFacetValuesBy": "count",
    "renderingContent": {
        "facetOrdering": {
            "facets": {
                "order": [
                    "locale",
                    "media.type",
                    "status"
                ]
            },
            "values": {
                "locale": {
                    "sortRemainingBy": "alpha"
                },
                "media.type": {
                    "sortRemainingBy": "alpha"
                },
                "status": {
                    "sortRemainingBy": "alpha"
                }
            }
        }
    }
}

I tried changing the batchSize but the issue remained.
I used to have a index_if in there, but removed it and the issue remained.

When running the search:import command and regularly refreshing the index on the Algolia dashboard, the "No. records" evolves like so (that's only an example, values change if I re-run this on a clear index):

  • 500
  • 1000
  • 1,500
  • 2,000
  • 2,253
  • 2,525
  • 3,025
  • (...)

As you can see, thinks looks OK at first, but then get a bit crazy around the 2000/2500 mark.

Steps To Reproduce

Unfortunately this is hard to reproduce, because I can't pinpoint the origin of the issue (and the randomness makes it even stranger) 🙁

I tried looking at the Symfony logs to see if some error appeared there, but found nothing.

What could prevent records from appearing in my index?

@quentint
Copy link
Author

quentint commented Mar 14, 2023

Digging a bit more, I can confirm the issue come from this repo (and not algolia/algoliasearch-client-php), because I wrote this simple command that uses it directly and works as intended:

<?php
// src/Command/MediaIndexCommand.php

namespace App\Command;

use Algolia\AlgoliaSearch\SearchClient;
use App\Entity\MediaTranslation;
use App\Serializer\Normalizer\MediaTranslationNormalizer;
use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

#[AsCommand(
    name: 'app:media:index',
    description: 'Index media translations',
)]
class MediaIndexCommand extends Command
{

    public function __construct(private readonly EntityManagerInterface $manager, private readonly MediaTranslationNormalizer $normalizer)
    {
        parent::__construct();
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $io = new SymfonyStyle($input, $output);

        $client = SearchClient::create('...', '...');
        $index = $client->initIndex('quentin_media');
        $index->clearObjects();

        $translations = $this->manager->getRepository(MediaTranslation::class)->findAll();
        $chunks = array_chunk($translations, 500);

        foreach ($chunks as $chunkIndex => $chunk) {
            $io->info("Chunk $chunkIndex");
            $objects = array_map(fn(MediaTranslation $translation) => [...$this->normalizer->normalize($translation, 'searchableArray'), 'objectID' => $translation->getId()], $chunk);
            $index->saveObjects($objects);
        }

        return Command::SUCCESS;
    }
}

image

I hope this helps.

@quentint
Copy link
Author

Still investigating... Looking at the logs generated with Algolia\AlgoliaSearch\Log\DebugLogger::enable(); I don't see anything special.

Also, I don't understand how/where the bundle does anything different from my own command (apart from supporting more cases) 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant