Mac os x apache tika automation3/8/2023 ![]() As a result, a single SAXParser instance can end up simultaneously parsing documents in multiple threads. A quick look at the source code shows that an AutoDetectParser holds a MimeTypes which holds an XmlRootExtractor which holds a SAXParser. Other posts have stated that AutoDetectParser is thread-safe. This seems to be a concurrency problem we have not seen the issue when running single threaded. In our case, it appears that a HashMap inside Xerces gets corrupted, causing an infinite loop inside HashMap.get(). ![]() If we assign multiple threads to the parsing task we find that the AutoDetectParser.parse() method occasionally fails to return. We are using Tika 0.5 to parse files that are added to a Lucene index.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |